the inside architecture of a large language model –ar 5:3 –style raw
Stable Artisan: Media Generation and Editing on Discord — Stability AI
Karpathy gives an update on llm.c. llm.c is a project that simplifies the training of large language models by using the low-level programming language C, reducing millions of lines of code in Python and PyTorch to just around 1,000 lines. This approach makes the code more compact and educational, but it sacrifices flexibility and initial speed optimizations.
State of the Union [May 3, 2024] · karpathy/llm.c · Discussion #344 · GitHub
“Another potentially big paper: fine-tuning – a key way of customizing AI models on specialized data – doesn’t seem to work very well if you actually want the model to learn new things. Fine-tuned models struggle with the new knowledge hallucinate more. Big context windows FTW?
“The rise of AI means *great* software is necessary again. After spending all day looking at demos so far today, I can say resolutely: Founders must learn how to make good user experiences again. I can tell immediately. I only want to fund people who can make great software.” / X
“Scale AI just released new research uncovering significant ‘overfitting’ of certain LLMs on popular AI benchmarks. Mistral and Phi were under-performers, while GPT-4, Claude, Gemini, and Llama all stood its ground
[2404.19756] KAN: Kolmogorov-Arnold Networks
[2405.02793] ImageInWords: Unlocking Hyper-Detailed Image Descriptions
[2405.04788v1] DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector
[2405.05254] You Only Cache Once: Decoder-Decoder Architectures for Language Models
[2405.05904] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
“I added arena elo to my LLM pricing table The score is pulled from @huggingface Initial takeaways: • llama 3 70b is game changing • haiku remains excellent value • gemini 1.5 pro is exceptional • gpt-4 turbo reigns supreme My table is sorted by arena elo, desc. Happy to
“At 1M context, a ≈250B MLA model like DS-V2 uses only 34.6GB for cache. We are entering a regime where saved kv-caches with really-many-shot examples (or learning a language, or…) become a sensible alternative to finetuning.
“xLSTM: Extended Long Short-Term Memory Attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs. To enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new
“xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling.
“How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context:
“MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵
Kolmogorov-Arnold Network is just an MLP : r/MachineLearning
“📄Now preprinted – Part II of our philosophical introduction to language models! While Part I focused on continuity w/ classical debates, Part II is more forward-looking and cover new issues. 1/5
“Synthetic data generation cannot expand the manifold (the model’s knowledge), but it can still be very useful. It can act as a training data denoising process that makes your data better suitable for curve fitting.” / X
Consistency Large Language Models: A Family of Efficient Parallel Decoders | Hao AI Lab @ UCSD
DOCCIhttps://google.github.io/docci/

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.
Be Sure To Read This Week’s Main Post:
This week’s executive overview and top links are here:
AI News #32: Week Ending 05/10/2024 with Executive Summary and Top 70 Links
The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.
- Agents/Copilots
- Amazon
- Apple
- Artificial General Intelligence (AGI)
- Augmented and Virtual Reality (AR/VR)
- Autonomous Vehicles
- AI Audio
- Business and Enterprise AI
- Chips and Hardware
- Consumer Products
- Education
- Ethics/Legal Security
- Images/Photos
- International AI News
- Locally Run AI Models
- Mobile
- Meta
- Microsoft
- OpenAI
- Open Source
- Podcasts/YouTube
- Publishing and News
- Retrieval-Augmented Generation (RAG) News
- Robots and Embodiment
- Science and Medicine
- Video
- Vision/Multimodality
- X/Twitter/Grok
- Tech and Development
Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive/
For previous issues, please visit the archives!

Thanks for reading!





Leave a Reply