a sunny day with blue skies. a horse is plowing a field next to an apple orchard. A wooden sign says "Tech" --chaos 40 --ar 4:3 --style raw --personalize 9zxyhz8

Tech Papers, Training, and Development: Week Ending 06/14/2024

June 14, 2024

a sunny day with blue skies. a horse is plowing a field next to an apple orchard. A wooden sign says “Tech” –chaos 40 –ar 4:3 –style raw –personalize 9zxyhz8

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference – Apple Machine Learning Research

https://machinelearning.apple.com/research/talaria

“if you want to know where @browsercompany is heading then today’s release gives you the clues it includes my favorite feature that we’ve ever shipped (as tiny as it may seem at first glance) as always, here’s the backstory on why this “small” improvement is such a big deal…

if you want to know where @browsercompany is heading then today's release gives you the clues

it includes my favorite feature that we've ever shipped (as tiny as it may seem at first glance)

as always, here's the backstory on why this "small" improvement is such a big deal… pic.twitter.com/JJCq5ygQfD
— Josh Miller (@joshm) June 13, 2024

“Sakana is the most innovative AI lab right now, hands down. This delightful work discovers a whole pile of new preference optimization loss functions by using evolutionary strategies with entire LLM training processes involved. The resulting LRML method even seems to” / X

Sakana is the most innovative AI lab right now, hands down.

This delightful work discovers a whole pile of new preference optimization loss functions by using evolutionary strategies with entire LLM training processes involved.

The resulting LRML method even seems to… https://t.co/ExVKgDB9dl
— Andrew Carr (e/🤸) (@andrew_n_carr) June 13, 2024

“Multimodal Table Understanding Introduces Table-LLaVa 7B, a multimodal LLM for multimodal table understanding. Competitive with GPT-4V and significantly outperforms existing MLLMs on multiple benchmarks. They also develop a large-scale dataset MMTab, covering table images,

Multimodal Table Understanding

Introduces Table-LLaVa 7B, a multimodal LLM for multimodal table understanding.

Competitive with GPT-4V and significantly outperforms existing MLLMs on multiple benchmarks.

They also develop a large-scale dataset MMTab, covering table images,… pic.twitter.com/l2mOvUU408
— elvis (@omarsar0) June 13, 2024

Agile Otter Blog: Programming Is Mostly Thinking

https://agileotter.blogspot.com/2014/09/programming-is-mostly-thinking.html

“Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with ‘Grokking’) 🤯 For a challenging reasoning task with a large search space,

Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯

For a challenging reasoning task with a large search space,… pic.twitter.com/6wTr6q3vEd
— Rohan Paul (@rohanpaul_ai) June 11, 2024

“Announcing LiveBench AI – The WORLD’S FIRST LLM Benchmark That Can’t Be Gamed!! We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI! LiveBench is a living/breathing benchmark with new challenges that you CAN’T simply memorize. Unlike blind human eval,

Announcing LiveBench AI – The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!!

We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI!

LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval,… pic.twitter.com/w0Xq2d2m5L
— Bindu Reddy (@bindureddy) June 12, 2024

“CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Comprises ~5K meticulously curated test samples, covering 26 subfields across 4 key areas of CS repo:

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Comprises ~5K meticulously curated test samples, covering 26 subfields across 4 key areas of CS

repo: https://t.co/XYaWc21hkU
abs: https://t.co/4MVwrigCUf pic.twitter.com/Asa6ps2fFH
— Aran Komatsuzaki (@arankomatsuzaki) June 14, 2024

[2406.06007v1] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

https://arxiv.org/abs/2406.06007v1

“🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge! Link:

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨
Thanks @crwhite_ml and @SpamuelDooley for leading the charge!
Link: https://t.co/blOR8qLInV
Existing LLM benchmarks have serious limitations: 🧵 pic.twitter.com/O1A74cs4R0
— Micah Goldblum (@micahgoldblum) June 12, 2024

“DPO is out DiscoPOP is in 🔥 This paper proposes to take the human equations out from DPO with DiscoPOP Discovered Preference Optimization 🤯 ✨ All existing SOTA Preference Optimization algorithms have been developed by human experts. These solutions are inherently

DPO is out DiscoPOP is in 🔥

This paper proposes to take the human equations out from DPO with DiscoPOP Discovered Preference Optimization 🤯

✨ All existing SOTA Preference Optimization algorithms have been developed by human experts. These solutions are inherently… pic.twitter.com/CTNm9yux7D
— Rohan Paul (@rohanpaul_ai) June 14, 2024

“LoRA Finetuning of LLMs can be mysterious. 🤯 First the basics – with LoRA the fine-tuned weight W′ can be represented as: 👉 W′ = W_0 + ∆W = W_0 + BA Where the trainable parameters are the low-rank matrices B and A Essentially, for finetuning, one can either initialize B to

LoRA Finetuning of LLMs can be mysterious. 🤯

First the basics – with LoRA the fine-tuned weight W′ can be represented as:

👉 W′ = W_0 + ∆W = W_0 + BA

Where the trainable parameters are the low-rank matrices B and A

Essentially, for finetuning, one can either initialize B to… pic.twitter.com/7QWdGFqH4o
— Rohan Paul (@rohanpaul_ai) June 14, 2024

“👏 Hats off to Wenbo Pan for creating Faro Yi 9B DPO! With 200K context in just 16GB VRAM, it’s a game-changer for efficient AI. Dive into Wenbo’s impressive work on @huggingface here:

👏 Hats off to Wenbo Pan for creating Faro Yi 9B DPO! With 200K context in just 16GB VRAM, it's a game-changer for efficient AI.

Dive into Wenbo’s impressive work on @huggingface here: https://t.co/aqUQ6qzgkm
— Yi-01.AI (@01AI_Yi) June 14, 2024

Building AI products — Benedict Evans

https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products

“A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:
* 7% attention, the rest is Mamba2
* MMLU jumps from 50 to 53.6%
* Training efficiency is the same
* Inference cost is much lesshttps://t.co/x62otbC5uN pic.twitter.com/bBfFYEt0a0
— Bryan Catanzaro (@ctnzr) June 13, 2024

BitsFusion

https://snap-research.github.io/BitsFusion

Breaking Down Barriers to AI Innovation with Reid Hoffman & Kevin Scott – YouTube

Announcing Mozilla Builders – Mozilla Innovations

https://future.mozilla.org/builders/blog/announcing-mozilla-builders

Active Stereo Without Pattern Projector

https://vppstereo.github.io

“”Hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models.” Paper – Calibrated Language Models Must Hallucinate

"Hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models."

Paper – Calibrated Language Models Must Hallucinate pic.twitter.com/edMIpN1PJ6
— Rohan Paul (@rohanpaul_ai) June 12, 2024

“RLHF heavily reduces LLM creativity and output variety. 🤔 This paper explores on the unintended consequences of aligning Large Language Models with RLHF. While alignment reduces toxic and biased content, it also seems to limit their creativity, defined as the ability to

RLHF heavily reduces LLM creativity and output variety. 🤔

This paper explores on the unintended consequences of aligning Large Language Models with RLHF.

While alignment reduces toxic and biased content, it also seems to limit their creativity, defined as the ability to… pic.twitter.com/iT1TeWQceJ
— Rohan Paul (@rohanpaul_ai) June 11, 2024

Samsung Showcases AI-Era Vision and Latest Foundry Technologies at SFF 2024 | Business Wire

https://www.businesswire.com/news/home/20240612070282/en/Samsung-Showcases-AI-Era-Vision-and-Latest-Foundry-Technologies-at-SFF-2024

Generative AI is not going to build your engineering team for you – Stack Overflow

https://stackoverflow.blog/2024/06/10/generative-ai-is-not-going-to-build-your-engineering-team-for-you

Firefox will upgrade more Mixed Content in Version 127 – Mozilla Security Blog

Firefox will upgrade more Mixed Content in Version 127

“Unsloth🦥is now in @huggingface AutoTrain! You can QLoRA finetune a LLM like Llama-3, Mistral, Phi3, Qwen2 & Gemma 2x faster, use 70% less memory and get 4x longer contexts than FA2! And @UnslothAI can be called directly via AutoTrain’s UI or through HF’s online🔧Train button!” / X

Unsloth🦥is now in @huggingface AutoTrain! You can QLoRA finetune a LLM like Llama-3, Mistral, Phi3, Qwen2 & Gemma 2x faster, use 70% less memory and get 4x longer contexts than FA2!

And @UnslothAI can be called directly via AutoTrain's UI or through HF's online🔧Train button! https://t.co/KXFUdjiYiT
— Daniel Han (@danielhanchen) June 11, 2024

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.