Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: 1980s NORAD war room CRT monitor showing cascading green phosphor text of code with red glowing sections spreading like a virus, dark silhouette of operator reaching toward screen, large bold red sans-serif text reading LLAMA across top, amber and blue wireframe graphics in background, cinematic lighting, high contrast, retro computer terminal aesthetic
18,000 tokens/sec even for Llama 3.1 8B is ridiculous. Even a “”dumb”” model as llama 3.1 would be incredibly useful at this speed. It works by merging storage and compute, permanently etching the model parameters directly into the physical transistors of the chip. Demo link
https://x.com/_philschmid/status/2025830254753853843
ollama run lfm2:24b-a2b .@liquidai’s latest on-device model is here! It’s the largest LFM2 model yet, and is designed to run fast on device, and fits on devices with 32GB of unified memory.
https://x.com/ollama/status/2026305296709173535





Leave a Reply