Llama: AI News Week Ending 05/30/2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Assembly instruction diagram for a modular hiking backpack frame system, outdoor gear technical style, tan canvas and forest green colors, topographic map background, “LLAMA” in outdoor brand font, weight distribution points marked, attachment system detailed

So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way.”” / X https://x.com/karpathy/status/1927506788527591853

Ollama can now think! 🤔🤔🤔 For thinking models, and especially useful for very thoughtful models like DeepSeek-R1-0528, Ollama can separate the thoughts and the response. Thinking can also be disabled! This is useful for getting a direct response. This works across https://x.com/ollama/status/1928543644090249565

Just FYI all the reports from our RL experiments have not been on Qwen, they’ve been on Llama (DeepHermes 8B) – so hopefully that gives some additional assurance on the impact RL can have and that its not random god-mode qwen math improvements from randomness”” / X https://x.com/i/web/status/1928184393035559191

RAG is dead, long live agentic retrieval! At LlamaIndex we’ve been saying for a long time that naive RAG is not enough for a modern application. Following from that conviction, we’ve built agentic strategies directly into LlamaCloud that you can adopt with just a few lines of https://x.com/llama_index/status/1928142249935917385