an abandoned factory in a forest with a trail sign that reads “Technology” –ar 5:3 –style raw

This week’s category cover theme is a sign in a forest.  Each category image prompt is a derivative of the formula “an [category themed object]  in a forest with a trail sign that reads “[category name]”.  Using a theme each week takes the cover creation time down to about 20 minutes, rather than several hours.

Reflections on AI Engineer Summit 2023

Grounded 3D-LLM

“Hallucinations are one of the biggest blockers to production LLMs & agents. No hallucinations (<5%) have been achieved internally — and for customers. We’ve been able to tune LLMs to recall specific key terms and figures with *photographic memory*, e.g chatting about a product” / X

Mapping the Mind of a Large Language Model | Hacker News

“Nice article in Financial Time where I explain that Auto-Regressive LLM are insufficient to reach human-level intelligence (or even cat-level intelligence). But alternative architectures that I call “objective driven” may reach human-level intelligence one day. They use world…” / X

“My statement seems obvious. Still, most recent CLIP-style papers train on EN data, either explicitly, or via more subtle filters (CLIP filter, WordNet classes, …) Why? Several recent papers show this improves quality. But what quality? The usual suspects: ImageNet, COCO…” / X

LangChain v0.2: A Leap Towards Stability

Let’s talk about LLM evaluation

Ten Commandments to Deploy Fine-Tuned Models in Prod – Google Slides

“📄Refreshed docs for LangChain v0.2 We’ve listened to your feedback and made major improvements to our docs. With the release of LangChain v0.2 today, we now have versioned docs, with clearer structure and consolidated content. Our docs are separated into: • Tutorials: 

Improving Prompt Consistency with Structured Generations

“Enjoyed this extremely comprehensive study on predicting language model performance 

“Fun fact: Transformer was almost named “CargoNet” by Noam. I’m glad he was outvoted and history took a different turn. 😅” / X

“Metal’s reports let you run complex, multi-step AI operations on large amounts of company data. A few use cases? ✅ Streamline information requests ✅ Conduct initial ESG diligence ✅ Summarize call transcripts and discover insights 😎 

Auto Wiki by Mutable.ai
View high-quality, automatically-generated documentation for any repository.

“Introducing “Hard Prompts” Category in Arena! In response to the community’s growing interest in evaluating models on more challenging tasks, we are excited to launch the new “Hard Prompts” category. We select user prompts that are more complex, specific, and problem-solving… 

“Nice report on challenges in evaluating LLMs. It also includes a section on best practices for language model evaluation. Great read and lessons on the very difficult task of LLM evaluation. 

“Enabling sparse, foundational LLMs for faster and more efficient models from Neural Magic and Cerebras 

“Introducing new factual knowledge through fine-tuning an LLM will increase the risk of hallucinations. This is what @Google is exploring in this paper and posits LLMs mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more 

“So far there are 3 major use cases for LLMs: 1. StackOverflow replacement 2. Do my homework (might be #1 or tied for #1) 3. Internal enterprise knowledge base There are many smaller use cases, including customer support chatbots, copyediting, spam, etc.” / X

“By end of 2024, steering foundation models in latent/activation space will outperform steering in token space (“prompt eng”) in several large production deployments. I felt skeptical about this in summer ’23, felt vaguely positive in Jan, and now think it’s more likely than not,” / X

“I discovered at ICLR 2024 that a lot of what I take for granted about LLM evaluation is actually not that widely known… So I made a blog! – how do we do currently do LLM evaluation? ⚖️ – most importantly, what is it actually useful for? 🤔 

“once a week i tell a founder “stop trying to finetune models, and just go sell, use opus, use 4-turbo, and just raise prices, find value, go sell, and sell to rich people, stop selling to developers, sell to capital allocators, and not wage workers. make your roadster, get the” / X

“Introducing “Hard Prompts” Category in Arena! In response to the community’s growing interest in evaluating models on more challenging tasks, we are excited to launch the new “Hard Prompts” category. We select user prompts that are more complex, specific, and problem-solving 

“We don’t need skip connections or normalization layers either 

“Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? This is one of the more interesting LLM papers I read last week. It reports that LLMs struggle to acquire factual knowledge through fine-tuning. When examples with new knowledge are eventually learned they 

“If 2024 papers are to be trusted: You don’t need (most) attention you don’t need (most) kv cache You don’t need (most) FFN layers You don’t need a reward model You don’t need… all the stuff that still makes frontier models work, ironically” / X

“Achieve the performance of existing transformer LLMs, while requiring 5% of the training cost. 🔥 Paper: Linearizing Large Language Models 📌 Introduces a method called Scalable UPtraining for Recurrent Attention (SUPRA), that allows the conversion of pre-trained LLMs into 

“🧬🧬 LLM Generated UIs We’ve added a series of templates and documentation showing off how to build generative UI applications using LangChain JS/TS & Next.js. These templates include: – 🌆 generative UI in Next.js – 🤖 streaming agent events – 🛠️ streaming tool calls and more! 

“As part of 0.2, we did a docs overhaul: 📃versioned docs 🗺️MUCH simpler navigation 🪆”LangChain over time” section Would love feedback on the new structure and additions!” / X

“A saturated benchmark gives a false impression that the underlying progress is slowing down. Benchmarks are proxy for what we care about, which are often hard to measure. When they are saturated, they are useless and even misleading.” / X

“It’s nice to have good names for things. I’m proud to have named or been involved in naming a bunch of things at Google over the years, including: MapReduce Bigtable Spanner TensorFlow Tensor Processing Units (TPUs) Pathways Protocol Buffers PaLM Gemini” / X

PEERING THROUGH PREFERENCES: UNRAVELING FEEDBACK ACQUISITION FOR ALIGNING LARGE LANGUAGE MODELS

Chameleon: Mixed-Modal Early-Fusion Foundation

Models

[2104.14337] Dynabench: Rethinking Benchmarking in NLP

[2309.03882] Large Language Models Are Not Robust Multiple Choice Selectors

[2311.06233] Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models

[2404.13076] LLM Evaluators Recognize and Favor Their Own Generations

[2405.09789v1] LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

[2405.10508] ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

[2405.10523v1] Smart Expert System: Large Language Models as Text Classifiers

[2405.10612v1] Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transformers

[2405.12209v1] MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

[2405.12564v1] ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

[2405.12710v1] Text-Video Retrieval with Global-Local Semantic Consistent Learning

[2405.12832v1] Wav-KAN: Wavelet Kolmogorov-Arnold Networks

A root-server at the Internet’s core lost touch with its peers. We still don’t know why. | Ars Technica

“‘we are nowhere near the point of diminishing marginal returns on how powerful we can make AI models as we increase the scale of compute’ 

“(1/9) LLM as a Judge: Numeric Score Evals are Broken!!! LLM Evals are valuable analysis tools. But should you use numeric scores or classes as outputs? 🤔 TLDR: LLM’s suck at continuous ranges ☠️ – use LLM classification evals instead! 🔤 An LLM Score Eval uses an LLM to judge 

“Layer-Condensed KV Cache for Efficient Inference of Large Language Models Achieves up to 26× higher throughput than standard transformers and competitive performance in language modeling and downstream tasks repo: 

“Observational Scaling Laws and the Predictability of Language Model Performance Presents observational scaling laws – an approach that generalizes existing compute scaling laws to handle multiple model families using a shared, low-dim capability space 

Feature UMAP

First-ever AI Code Interpreter for R

Documentation Refresh for LangChain v0.2

“I discovered at ICLR 2024 that a lot of what I take for granted about LLM evaluation is actually not that widely known… So I made a blog! – how do we do currently do LLM evaluation? ⚖️ – most importantly, what is it actually useful for? 🤔 

“And it’s out! 😀 A good read if you want to think about doing robust evaluation, going in depths into the nits of it. 

“🚀How can we use LLMs to accelerate scientific discovery? Let’s find out! This year, hundreds of people from across the globe worked together in a hackathon to BUILD groundbreaking prototypes — showing the path to breakthroughs in next generation batteries, sustainability, 

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

We’ve raised $12M in Series A funding to reimagine presentations, powered by AI.

What I’ve Learned Building Interactive Embedding Visualizations – Casey Primozic’s Homepagehttps://cprimozic.net/blog/building-embedding-visualizations-from-user-profiles/

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #34: Week Ending 05/24/2024 with Executive Summary and Top 47 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.

For previous issues, please visit the archives!

Thanks for reading!

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading