Ethan B. Holland

Over 56,600 manually organized AI links and counting

the inside architecture of a large language model --ar 5:3 --style raw

Tech and Development: Week Ending 05/10/2024

May 10, 2024

the inside architecture of a large language model –ar 5:3 –style raw

Stable Artisan: Media Generation and Editing on Discord — Stability AI

https://stability.ai/news/stable-artisan

Karpathy gives an update on llm.c. llm.c is a project that simplifies the training of large language models by using the low-level programming language C, reducing millions of lines of code in Python and PyTorch to just around 1,000 lines. This approach makes the code more compact and educational, but it sacrifices flexibility and initial speed optimizations.
State of the Union [May 3, 2024] · karpathy/llm.c · Discussion #344 · GitHub

https://github.com/karpathy/llm.c/discussions/344

“Another potentially big paper: fine-tuning – a key way of customizing AI models on specialized data – doesn’t seem to work very well if you actually want the model to learn new things. Fine-tuned models struggle with the new knowledge hallucinate more. Big context windows FTW?

Another potentially big paper: fine-tuning – a key way of customizing AI models on specialized data – doesn't seem to work very well if you actually want the model to learn new things.

Fine-tuned models struggle with the new knowledge & hallucinate more. Big context windows FTW? pic.twitter.com/dATYXNscHX
— Ethan Mollick (@emollick) May 10, 2024

“The rise of AI means *great* software is necessary again. After spending all day looking at demos so far today, I can say resolutely: Founders must learn how to make good user experiences again. I can tell immediately. I only want to fund people who can make great software.” / X

The rise of AI means *great* software is necessary again. After spending all day looking at demos so far today, I can say resolutely:

Founders must learn how to make good user experiences again.

I can tell immediately. I only want to fund people who can make great software.
— Garry Tan (@garrytan) May 5, 2024

“Scale AI just released new research uncovering significant ‘overfitting’ of certain LLMs on popular AI benchmarks. Mistral and Phi were under-performers, while GPT-4, Claude, Gemini, and Llama all stood its ground

Scale AI just released new research uncovering significant ‘overfitting’ of certain LLMs on popular AI benchmarks.

Mistral and Phi were under-performers, while GPT-4, Claude, Gemini, and Llama all stood its ground https://t.co/HqX9HQJ2tR
— Brett Adcock (@adcock_brett) May 5, 2024

[2404.19756] KAN: Kolmogorov-Arnold Networks

https://arxiv.org/abs/2404.19756

[2405.02793] ImageInWords: Unlocking Hyper-Detailed Image Descriptions

https://arxiv.org/abs/2405.02793

[2405.04788v1] DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

https://arxiv.org/abs/2405.04788v1

[2405.05254] You Only Cache Once: Decoder-Decoder Architectures for Language Models

https://arxiv.org/abs/2405.05254

[2405.05904] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

https://arxiv.org/abs/2405.05904

“I added arena elo to my LLM pricing table The score is pulled from @huggingface Initial takeaways: • llama 3 70b is game changing • haiku remains excellent value • gemini 1.5 pro is exceptional • gpt-4 turbo reigns supreme My table is sorted by arena elo, desc. Happy to

I added arena elo to my LLM pricing table

The score is pulled from @huggingface

Initial takeaways:
• llama 3 70b is game changing
• haiku remains excellent value
• gemini 1.5 pro is exceptional
• gpt-4 turbo reigns supreme

My table is sorted by arena elo, desc.

Happy to… pic.twitter.com/F9WNU0WM6F
— virat (@virattt) May 10, 2024

“At 1M context, a ≈250B MLA model like DS-V2 uses only 34.6GB for cache. We are entering a regime where saved kv-caches with really-many-shot examples (or learning a language, or…) become a sensible alternative to finetuning.

At 1M context, a ≈250B MLA model like DS-V2 uses only 34.6GB for cache.
We are entering a regime where saved kv-caches with really-many-shot examples (or learning a language, or…) become a sensible alternative to finetuning. https://t.co/6Yr1m4XH1s pic.twitter.com/nVp3L9rbi1
— Teortaxes▶️ (@teortaxesTex) May 7, 2024

“xLSTM: Extended Long Short-Term Memory Attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs. To enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new

xLSTM: Extended Long Short-Term Memory

Attempts to scale LSTMs to billions of parameters using the latest techniques from modern LLMs and mitigating common limitations of LSTMs.

To enable LSTMs the ability to revise storage decisions, they introduce exponential gating and a new… pic.twitter.com/Kby7nW9nnB
— elvis (@omarsar0) May 8, 2024

“xLSTM: Extended Long Short-Term Memory Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling.

xLSTM: Extended Long Short-Term Memory

Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to SotA Transformers and State Space Models, both in performance and scaling.https://t.co/HW77SEohJA pic.twitter.com/7DNRpHYfJT
— Aran Komatsuzaki (@arankomatsuzaki) May 8, 2024

“How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context:

How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up?

Introducing ContextCite: a simple method for attributing LLM responses back to the context: https://t.co/bm1t7nybbh

w/ @bcohenwang, @harshays_, @kris_georgiev1 pic.twitter.com/Cu6xE5bnqF
— Aleksander Madry (@aleks_madry) May 6, 2024

“MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵

MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵 pic.twitter.com/4Soe2M7NfG
— Ziming Liu (@ZimingLiu11) May 1, 2024

Kolmogorov-Arnold Network is just an MLP : r/MachineLearning

[D] Kolmogorov-Arnold Network is just an MLP
byu/osamc inMachineLearning

“📄Now preprinted – Part II of our philosophical introduction to language models! While Part I focused on continuity w/ classical debates, Part II is more forward-looking and cover new issues. 1/5

📄Now preprinted – Part II of our philosophical introduction to language models! While Part I focused on continuity w/ classical debates, Part II is more forward-looking and cover new issues. 1/5https://t.co/VQIZ6sb382 https://t.co/VD6SvobcjZ pic.twitter.com/vfl0mEB1Uv
— Raphaël Millière (@raphaelmilliere) May 7, 2024

“Synthetic data generation cannot expand the manifold (the model’s knowledge), but it can still be very useful. It can act as a training data denoising process that makes your data better suitable for curve fitting.” / X

Synthetic data generation cannot expand the manifold (the model's knowledge), but it can still be very useful. It can act as a training data denoising process that makes your data better suitable for curve fitting. https://t.co/OXOXKx1ZzE
— François Chollet (@fchollet) May 11, 2024

Consistency Large Language Models: A Family of Efficient Parallel Decoders | Hao AI Lab @ UCSD

https://hao-ai-lab.github.io/blogs/cllm

DOCCIhttps://google.github.io/docci/

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #32: Week Ending 05/10/2024 with Executive Summary and Top 70 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.