Ethan B. Holland

Over 55,600 manually organized AI links and counting

Llama: AI News Week Ending 02/27/2026

February 27, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: 1980s NORAD war room CRT monitor showing cascading green phosphor text of code with red glowing sections spreading like a virus, dark silhouette of operator reaching toward screen, large bold red sans-serif text reading LLAMA across top, amber and blue wireframe graphics in background, cinematic lighting, high contrast, retro computer terminal aesthetic

18,000 tokens/sec even for Llama 3.1 8B is ridiculous. Even a “”dumb”” model as llama 3.1 would be incredibly useful at this speed. It works by merging storage and compute, permanently etching the model parameters directly into the physical transistors of the chip. Demo link
https://x.com/_philschmid/status/2025830254753853843

ollama run lfm2:24b-a2b .@liquidai’s latest on-device model is here! It’s the largest LFM2 model yet, and is designed to run fast on device, and fits on devices with 32GB of unified memory.
https://x.com/ollama/status/2026305296709173535