Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Wide static shot of a real llama tethered in a crumbling concrete courtyard of a half-demolished Chinese electronics factory, scattered circuit boards and open-source hardware schematics on weathered walls, overcast flat daylight, muted grays and washed blues, observational realism, decelerated time, large white text overlay reading LLAMA positioned like Chinese cinema poster title, documentary stillness, human-scale intimacy, postindustrial decay, digital-poetic texture not glossy.

Taalas runs Llama 3 8B at 16k tokens per second per user. That’s almost an order of magnitude increase even compared to SRAM-based systems like Cerebras. Key idea: each chip is specialized to a given model. The chip is the model. The chat demo is pretty wild:”” https://x.com/awnihannun/status/2024671348782711153

Georgi’s llama.cpp really kicked off the whole local model thing in my opinion – it made original Llama usable on personal computers, I wrote about it back in March 2023 https://x.com/simonw/status/2024855027517702345

Large language models are having their Stable Diffusion moment https://simonwillison.net/2023/Mar/11/llama/#llama-cpp

Ollama 0.16.3 is out with Cline and Pi integrations out of the box. Try it with: @cline: ollama launch cline Pi: ollama launch pi”” https://x.com/ollama/status/2024978932127187375

BREAKING: Llama.cpp joins Hugging Face 🤯”” https://x.com/victormustar/status/2024842175532413016

GGML and llama.cpp join HF to ensure the long-term progress of Local AI https://huggingface.co/blog/ggml-joins-hf

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading