A fashion photoshoot of a runway look inspired by old rusty toolboxes. A large screen displays the word “Tech” –ar 4:3 –style raw

“Creating a Pipeline for Generating Synthetic Data for Fine-Tuning Custom Embedding Models. 👀 Step 1 Create a Knowledge Base: Start with preparing your domain specific knowledge base, such as PDFs or other documents containing information. Convert the content of these documents

Creating a Pipeline for Generating Synthetic Data for Fine-Tuning Custom Embedding Models. 👀

Step 1 Create a Knowledge Base: Start with preparing your domain specific knowledge base, such as PDFs or other documents containing information. Convert the content of these documents… pic.twitter.com/0mYDJKMylY
— Philipp Schmid (@_philschmid) June 5, 2024

“At current growth rates, AI runs out of easy-to-access high quality data by 2028, depending on how aggressive training is. There are techniques that may extend this (eg synthetic data) and possibilities for using less-easily-accessed data. (Google & Meta are sitting on a lot).”

At current growth rates, AI runs out of easy-to-access high quality data by 2028, depending on how aggressive training is.

There are techniques that may extend this (eg synthetic data) and possibilities for using less-easily-accessed data. (Google & Meta are sitting on a lot). https://t.co/wvxi60y3in
— Ethan Mollick (@emollick) June 7, 2024

“Given all this, when will we exhaust the web’s text? Training a compute-optimal dense model on ~100T tokens for 4 epochs would take ~5e28 FLOP (around 3 OOMs above GPT-4). At historical growth rates, we’ll reach this level by 2028. 7/12

Given all this, when will we exhaust the web's text?

Training a compute-optimal dense model on ~100T tokens for 4 epochs would take ~5e28 FLOP (around 3 OOMs above GPT-4). At historical growth rates, we'll reach this level by 2028. 7/12 pic.twitter.com/p2bPWhg2M5
— Epoch AI (@EpochAIResearch) June 6, 2024

“So this paper found you can cut the API token costs of using Chain of Thought prompting by over 20% with no decrease in accuracy for GPT-4 (though a decrease in math accuracy in GPT-3.5) by just adding the words “be concise.” That’s all. LLMs are weird.

So this paper found you can cut the API token costs of using Chain of Thought prompting by over 20% with no decrease in accuracy for GPT-4 (though a decrease in math accuracy in GPT-3.5) by just adding the words "be concise."

That's all. LLMs are weird. https://t.co/79SpgacxI0 pic.twitter.com/ijPa4knVWE
— Ethan Mollick (@emollick) June 8, 2024

[2405.09032] ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition

https://arxiv.org/abs/2405.09032

[2405.14831] HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

https://arxiv.org/abs/2405.14831

[2405.21018v1] Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

https://arxiv.org/abs/2405.21018v1

[2406.02350v1] LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing

https://arxiv.org/abs/2406.02350v1

[AINews] FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you’re welcome) • Buttondown

https://buttondown.email/ainews/archive/ainews-fineweb-15t-tokens-of-commoncrawl

“Towards Scalable Automated Alignment of LLMs Great overview of methods used for automated alignment of LLMs. The four main directions explored in the paper are the following: – Aligning through inductive bias – Aligning through behavior imitation – Aligning through model

Towards Scalable Automated Alignment of LLMs

Great overview of methods used for automated alignment of LLMs.

The four main directions explored in the paper are the following:

– Aligning through inductive bias
– Aligning through behavior imitation
– Aligning through model… pic.twitter.com/0H9THXV5Hp
— elvis (@omarsar0) June 4, 2024

“xLSTM: Extended Long Short-Term Memory “performs favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.” LSTM is not dead! Looking forward to see the comeback of RNNs🔥

xLSTM: Extended Long Short-Term Memory

“performs favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.”

LSTM is not dead! Looking forward to see the comeback of RNNs🔥https://t.co/7vEilqoRIi https://t.co/30NgJqyRcP pic.twitter.com/U4GcSnVquR
— hardmaru (@hardmaru) June 5, 2024

zeux.io – LLM inference speed of light

https://zeux.io/2024/03/15/llm-inference-sol

Mesop: Quickly build web UIs in Python

Used at Google for rapid internal app development

https://google.github.io/mesop

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

https://zzzyuqing.github.io/dreammat.github.io

“Qdrant is now fully integrated with @neo4j’s APOC procedures, bringing advanced vector search capabilities to your graph database applications! 🚀 📖 Read the documentation:

Qdrant is now fully integrated with @neo4j's APOC procedures, bringing advanced vector search capabilities to your graph database applications! 🚀

📖 Read the documentation: https://t.co/tAxTa3iivM
— Qdrant (@qdrant_engine) June 7, 2024

“⚡️ FastEmbed 0.3.0 is here! Now featuring Image embeddings (ResNet50), multimodal embeddings (CLIP), late interaction embeddings (ColBERT), and an innovative type of sparse embeddings. 🙌 GitHub:

⚡️ FastEmbed 0.3.0 is here!

Now featuring Image embeddings (ResNet50), multimodal embeddings (CLIP), late interaction embeddings (ColBERT), and an innovative type of sparse embeddings. 🙌

GitHub: https://t.co/xncvKArHx6
Change Log: https://t.co/J4LQjwHJKT pic.twitter.com/Eto7dd8VnU
— Qdrant (@qdrant_engine) June 6, 2024

LLM Merging Competition: Building LLMs Efficiently through Merging | NeurIPS 2024 Challenge

https://llm-merging.github.io

“How to organize and generate high quality data is the secret sauce of fine tuning. Daniel is going to provide a masterclass on this topic

How to organize and generate high quality data is the secret sauce of fine tuning.

Daniel is going to provide a masterclass on this topic https://t.co/9oMI3CZB7C https://t.co/PqO7pOnzB1
— Hamel Husain (@HamelHusain) June 4, 2024

“Emmanuel elaborates why he’s increasingly bearish on fine-tuning in this talk: Why Fine-Tuning is Dead I am not as bearish, which is why I think the talk is interesting!

Emmanuel elaborates why he's increasingly bearish on fine-tuning in this talk: Why Fine-Tuning is Dead

I am not as bearish, which is why I think the talk is interesting!https://t.co/HfB7PWmb0B https://t.co/KZg0qjeEaE pic.twitter.com/SXD95dQ0mK
— Hamel Husain (@HamelHusain) June 7, 2024

BrightEdge Releases Post Google I/O Data on The Impact of

https://www.globenewswire.com/news-release/2024/06/04/2893289/0/en/BrightEdge-Releases-Post-Google-I-O-Data-on-The-Impact-of-AI-Overviews.html

“Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock time

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock timehttps://t.co/Sd3J3kPG5W pic.twitter.com/C2nAisXcoN
— Aran Komatsuzaki (@arankomatsuzaki) June 3, 2024

“Are We Done with MMLU? Creates MMLU-Redux, which is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects data:

Are We Done with MMLU?

Creates MMLU-Redux, which is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects

data: https://t.co/euRW7hbHIj
abs: https://t.co/me7uQPoTno pic.twitter.com/nBln3CcZ0Y
— Aran Komatsuzaki (@arankomatsuzaki) June 7, 2024

“Awesome and highly useful: FineWeb-Edu 📚👏 High quality LLM dataset filtering the original 15 trillion FineWeb tokens to 1.3 trillion of the highest (educational) quality, as judged by a Llama 3 70B. +A highly detailed paper. Turns out that LLMs learn a lot better and faster

Awesome and highly useful: FineWeb-Edu 📚👏
High quality LLM dataset filtering the original 15 trillion FineWeb tokens to 1.3 trillion of the highest (educational) quality, as judged by a Llama 3 70B. +A highly detailed paper.

Turns out that LLMs learn a lot better and faster… https://t.co/f3wqPbNkJ5 pic.twitter.com/9nXaet5tmG
— Andrej Karpathy (@karpathy) June 2, 2024

“Transformers are SSMs Generalized Models and Efficient Algorithms Through Structured State Space Duality While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown

Transformers are SSMs

Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown pic.twitter.com/9400BttLNR
— AK (@_akhaliq) June 3, 2024

“The Geometry of Concepts in LLMs Studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs. Finding from the paper: “Simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal

The Geometry of Concepts in LLMs

Studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs.

Finding from the paper: "Simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal… pic.twitter.com/MA9AJ5eFGb
— elvis (@omarsar0) June 4, 2024

“Thought-Augmented Reasoning with LLMs Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning. It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from

Thought-Augmented Reasoning with LLMs

Presents a thought-augmented reasoning approach, Buffer of Thoughts, to enhance the accuracy, efficiency, and robustness of LLM-based reasoning.

It leverages a meta-buffer containing high-level thoughts (thought templates) distilled from… pic.twitter.com/cBi3aJxmJe
— elvis (@omarsar0) June 7, 2024

“llm.c by Hand✍️ C programming + matrix multiplication by hand This combination is perhaps as low as we can get to explain how the Transformer works. Special thanks to @karpathy for encouraging early feedback and @7etsuo for helping me understand the pragma magic. I hope

llm.c by Hand✍️

C programming + matrix multiplication by hand

This combination is perhaps as low as we can get to explain how the Transformer works.

Special thanks to @karpathy for encouraging early feedback and @7etsuo for helping me understand the pragma magic.

I hope… pic.twitter.com/jx1Ye0r0ei
— Tom Yeh | AI by Hand ✍️ (@ProfTomYeh) June 4, 2024

“This might be one of the most important 45-mn read you could indulge in today if you want to understand the secret behind high performance large language models like Llama3, GPT-4 or Mixtral Inspired by the @distillpub interactive graphics papers, we settled to write the most”

This might be one of the most important 45-mn read you could indulge in today if you want to understand the secret behind high performance large language models like Llama3, GPT-4 or Mixtral

Inspired by the @distillpub interactive graphics papers, we settled to write the most… https://t.co/Bk764EXvcP
— Thomas Wolf (@Thom_Wolf) June 2, 2024

“This is really a ‘WOW’ paper. 🤯 Claims that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilizing an optimized kernel during inference, their model’s memory consumption can be reduced by more

This is really a 'WOW' paper. 🤯

Claims that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilizing an optimized kernel during inference, their model’s memory consumption can be reduced by more… pic.twitter.com/RD6Iyb1lrV
— Rohan Paul (@rohanpaul_ai) June 7, 2024

“This new LoRA technique Orthonormal Low-Rank Adaptation (OLoRA) significantly accelerates the convergence of LLM training while preserving the efficiency benefits of LoRA, such as the number of trainable parameters and GPU memory footprint. 🔥 📌 OLoRA not only converges faster

This new LoRA technique Orthonormal Low-Rank Adaptation (OLoRA) significantly accelerates the convergence of LLM training while preserving the efficiency benefits of LoRA, such as the number of trainable parameters and GPU memory footprint. 🔥

📌 OLoRA not only converges faster… pic.twitter.com/Aunvq3gZSW
— Rohan Paul (@rohanpaul_ai) June 7, 2024

“Teach LLMs to internalize chain-of-thought (CoT) reasoning, without generating explicit intermediate steps, enabling implicit CoT reasoning during inference. 📌 Stepwise Internalization that successfully teaches LLMs to reason implicitly achieves high accuracy while maintaining

Teach LLMs to internalize chain-of-thought (CoT) reasoning, without generating explicit intermediate steps, enabling implicit CoT reasoning during inference.

📌 Stepwise Internalization that successfully teaches LLMs to reason implicitly achieves high accuracy while maintaining… pic.twitter.com/EwrexN72wL
— Rohan Paul (@rohanpaul_ai) June 6, 2024

“I am finding Infinity quite awesome ✨ . It’s a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks. — – Deploy any model from MTEB: deploy the model you know from SentenceTransformers – Fast

I am finding Infinity quite awesome ✨ .

It's a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

—

– Deploy any model from MTEB: deploy the model you know from SentenceTransformers

– Fast… pic.twitter.com/YfvO2JATtS
— Rohan Paul (@rohanpaul_ai) June 7, 2024

“Nice overview in this Paper – “Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey” 📌 Parameter-Efficient Fine-Tuning (PEFT): The core concept revolves around adapting pre-trained large models to specific tasks by modifying only a small subset of

Nice overview in this Paper – "Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey"

📌 Parameter-Efficient Fine-Tuning (PEFT): The core concept revolves around adapting pre-trained large models to specific tasks by modifying only a small subset of… pic.twitter.com/7nJLRAve1A
— Rohan Paul (@rohanpaul_ai) June 4, 2024

“Prompting correctly is power. Keep navigating the deep, dark depth of the latent space, and then you have a real advantage (jailbreaking, adhering to JSON schemas, grounding and much more)

Prompting correctly is power. Keep navigating the deep, dark depth of the latent space, and then you have a real advantage (jailbreaking, adhering to JSON schemas, grounding and much more) pic.twitter.com/4fIRpQKRCe
— Rohan Paul (@rohanpaul_ai) June 4, 2024

“been learning a lot about LLMs etc over the past year, organized some of my favorite explainers into a “textbook-shaped” resource guide wish i’d had this at the start, maybe it can useful to others on a similar journey

been learning a lot about LLMs etc over the past year, organized some of my favorite explainers into a “textbook-shaped” resource guide

wish i’d had this at the start, maybe it can useful to others on a similar journeyhttps://t.co/54gZimsOnO pic.twitter.com/mbiaJzQiu6
— will brown (@willccbb) June 5, 2024

“It’s also fairly clear to me rn that Memory databases are to 2024 what Vector databases were to 2023 @hwchase17 has a huge hit brewing on his hands with langmem

It's also fairly clear to me rn that Memory databases are to 2024 what Vector databases were to 2023@hwchase17 has a huge hit brewing on his hands with langmem https://t.co/bUsjFjJx90
— swyx (@swyx) April 6, 2024

“Excited to share what I’ve been working on as part of the former Superalignment team! We introduce a SOTA training stack for SAEs. To demonstrate that our methods scale, we train a 16M latent SAE on GPT-4. Because MSE/L0 is not the final goal, we also introduce new SAE metrics.”

Excited to share what I've been working on as part of the former Superalignment team!

We introduce a SOTA training stack for SAEs. To demonstrate that our methods scale, we train a 16M latent SAE on GPT-4. Because MSE/L0 is not the final goal, we also introduce new SAE metrics. https://t.co/0uc65Ex5YM
— Leo Gao (@nabla_theta) June 6, 2024

“We introduce a new SAE training stack based on a TopK activation function. This eliminates feature shrinking and lets us set L0 directly. We find that our method performs well on the MSE/L0 frontier. Our method has very few dead latents, even at 16M scale.

We introduce a new SAE training stack based on a TopK activation function. This eliminates feature shrinking and lets us set L0 directly. We find that our method performs well on the MSE/L0 frontier. Our method has very few dead latents, even at 16M scale. pic.twitter.com/Y3AVGD1fbB
— Leo Gao (@nabla_theta) June 6, 2024

“Built a simple but seemingly quite difficult benchmark for analyzing malicious solidity contract code. So far only the top closed models are capable of occasionally identifying code that is malicious, gpt-4o and claude-opus every open model I tried fails > 95% of the time”

Built a simple but seemingly quite difficult benchmark for analyzing malicious solidity contract code. So far only the top closed models are capable of occasionally identifying code that is malicious, gpt-4o and claude-opus every open model I tried fails > 95% of the time
— anton (@abacaj) June 5, 2024

blog.alexalemi.com KL is All You Need

https://blog.alexalemi.com/kl-is-all-you-need.htm

This week’s executive overview and top links are here:

AI News #36: Week Ending 06/07/2024 with Executive Summary and Top 40 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.