Tech Papers and Development: Week Ending 05/31/2024

a sunny day at the beach. a sand sculpture of laptop computer. the word "Tech" is drawn in the sand --ar 5:3 --style raw

Tech Papers and Development: Week Ending 05/31/2024

June 2, 2024

a sunny day at the beach. a sand sculpture of laptop computer. the word “Tech” is drawn in the sand –ar 5:3 –style raw

“You can now start building with Gemini 1.5 Flash and Pro models using our API pay-as-you-go service for developers. 🛠️ Flash is designed to be fast and efficient to serve – and we’re increasing the rate limit to 1000 requests per minute. Find out more →

You can now start building with Gemini 1.5 Flash and Pro models using our API pay-as-you-go service for developers. 🛠️

Flash is designed to be fast and efficient to serve – and we’re increasing the rate limit to 1000 requests per minute.

Find out more → https://t.co/lH5pAihJpV pic.twitter.com/a5LFoxghcj
— Google DeepMind (@GoogleDeepMind) May 30, 2024

“DPO(Direct Preference Optimization) can NOT be as good as PPO (Proximal Policy Optimization) – From latest Google research 🤔 It investigates why online reinforcement learning algorithms (like PPO) for aligning LLMs outperform offline algorithms (like DPO), despite both using

DPO(Direct Preference Optimization) can NOT be as good as PPO (Proximal Policy Optimization) – From latest Google research 🤔

It investigates why online reinforcement learning algorithms (like PPO) for aligning LLMs outperform offline algorithms (like DPO), despite both using… pic.twitter.com/wQ7VGstOhf
— Rohan Paul (@rohanpaul_ai) May 28, 2024

“1. Praktika (@PraktikaEnglish) Round: $35.5M Series A Praktika is a language learning app that uses AI avatars to simulate real-life conversational scenarios. Demo:

1. Praktika (@PraktikaEnglish)

Round: $35.5M Series A

Praktika is a language learning app that uses AI avatars to simulate real-life conversational scenarios.

Demo: pic.twitter.com/yNKPYoQ3Ja
— Chief AI Officer (@chiefaioffice) May 26, 2024

ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Christian Szegedy, Ilya Sutskever

https://www.latent.space/p/iclr-2024-recap

“🔁 Repetitions in LangSmith 🔁 You can now run multiple repetitions of your experiment in LangSmith. This helps smooth out noise from variability introduced by your application or from your LLM-as-a-judge evaluator, so you can build confidence in the results of your experiment.

🔁 Repetitions in LangSmith 🔁

You can now run multiple repetitions of your experiment in LangSmith. This helps smooth out noise from variability introduced by your application or from your LLM-as-a-judge evaluator, so you can build confidence in the results of your experiment.… pic.twitter.com/xaVYteOQWv
— LangChain (@LangChainAI) May 30, 2024

“Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two

Grokked Transformers are Implicit Reasoners

A Mechanistic Journey to the Edge of Generalization

We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two pic.twitter.com/gCW61ofOFF
— AK (@_akhaliq) May 27, 2024

“Meta presents Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Features trained on their automatically curated datasets outperform ones trained on manually curated data

Meta presents Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Features trained on their automatically curated datasets outperform ones trained on manually curated datahttps://t.co/HBlj7kHMPC pic.twitter.com/GFvRecFKHr
— Aran Komatsuzaki (@arankomatsuzaki) May 27, 2024

“Introducing @mem0ai – long-term memory for any LLM. ✨ Features: • Remembers and recalls contextually • Enhances personalization and relevance of responses • Compatible with any LLM 👇🏻 Sign up for early access in the next tweet

Introducing @mem0ai – long-term memory for any LLM.

✨ Features:

• Remembers and recalls contextually
• Enhances personalization and relevance of responses
• Compatible with any LLM

👇🏻 Sign up for early access in the next tweet pic.twitter.com/RCt8GTRxrS
— Taranjeet (@taranjeetio) May 28, 2024

“Today we published our VMware Validated Solution – Private AI Ready Infrastructure for VMware Cloud Foundation. If you’re ready to dig into the technical details on how to architect, deploy and operate #PrivateAI infrastructure, this is a great place to start.” / X

Today we published our VMware Validated Solution – Private AI Ready Infrastructure for VMware Cloud Foundation. If you're ready to dig into the technical details on how to architect, deploy and operate #PrivateAI infrastructure, this is a great place to start.…
— Chris Wolf (@cswolf) May 28, 2024

“I’ve been asked by few first year PhD about how to start LLM research on X, say long context modeling. My number one suggestion — though it seems a bit of unconventional — is *not* to read any papers related to long-context, but to talk to the model – Talk to the model about a” / X

I've been asked by few first year PhD about how to start LLM research on X, say long context modeling. My number one suggestion — though it seems a bit of unconventional — is *not* to read any papers related to long-context, but to talk to the model

– Talk to the model about a…
— Yao Fu (@Francis_YAO_) May 31, 2024

“Stacking Your Transformers A Closer Look at Model Growth for Efficient LLM Pre-Training LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of

Stacking Your Transformers

A Closer Look at Model Growth for Efficient LLM Pre-Training

LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of pic.twitter.com/vdlV2fwWPo
— AK (@_akhaliq) May 27, 2024

“VeLoRA Memory Efficient Training using Rank-1 Sub-Token Projections Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and fine-tuning these models is still far too

VeLoRA

Memory Efficient Training using Rank-1 Sub-Token Projections

Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and fine-tuning these models is still far too pic.twitter.com/2EOreZzwbP
— AK (@_akhaliq) May 29, 2024

“AutoCoder Enhancing Code Large Language Model with AIEV-Instruct We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more

AutoCoder

Enhancing Code Large Language Model with AIEV-Instruct

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more pic.twitter.com/OIS1YE3ZtM
— AK (@_akhaliq) May 27, 2024

“So position embedding is this year’s linear attention, isn’t it? But i prefer this. Linear attention was about removing capacity from the model, which didn’t make sense long term. Position embedding is about adding missing capabilities to the model, which makes a lot more sense.” / X

So position embedding is this year’s linear attention, isn’t it?

But i prefer this. Linear attention was about removing capacity from the model, which didn’t make sense long term. Position embedding is about adding missing capabilities to the model, which makes a lot more sense. https://t.co/AqLlpDxuS2
— Lucas Beyer (bl16) (@giffmana) May 30, 2024

“@danielhanchen I thought I didn’t have to deal with these, but already the 350M model (14 hours of 8 GPUs working) sometimes randomly hangs with a cryptic MPI error once in a while. So I have to put the whole optimization into a `while 1` loop and a script that watches the log file and sends” / X

I thought I didn't have to deal with these, but already the 350M model (14 hours of 8 GPUs working) sometimes randomly hangs with a cryptic MPI error once in a while. So I have to put the whole optimization into a `while 1` loop and a script that watches the log file and sends…
— Andrej Karpathy (@karpathy) May 28, 2024

1-bit LLMs Could Solve AI’s Energy Demands – IEEE Spectrum

https://spectrum.ieee.org/1-bit-llm

“I tried adding external memory to positional embedding, ie a readout from an external memory system which interacts with the embeddings based on some function of the token embeddings (here, the mean across the sequence). CoPE might be good to try too, it’s a really cool idea.” / X

I tried adding external memory to positional embedding, ie a readout from an external memory system which interacts with the embeddings based on some function of the token embeddings (here, the mean across the sequence). CoPE might be good to try too, it's a really cool idea. https://t.co/PDrkomrBk0
— rohit (@krishnanrohit) May 30, 2024

[21 May 2024] Life after DPO (for alignment) – Google Slides

https://docs.google.com/presentation/d/1on5xTePaUYg47vui3dUr0Lp6GUXmOXmhceNJLXRbGsE/edit#slide=id.g271884d3ab7_0_126

[2405.16749v1] DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models

https://arxiv.org/abs/2405.16749v1

[AINews] FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you’re welcome) • Buttondown

https://buttondown.email/ainews/archive/ainews-fineweb-15t-tokens-of-commoncrawl

“A 6 hr performance of shrink at nuit blanche in Paris 📹 Resimlerle sanat