pages of books form a winding road that stretches into the distance of a lush landscape --ar 5:3

Tech and Development: Week Ending 04/26/2024

April 27, 2024

pages of books form a winding road that stretches into the distance of a lush landscape –ar 5:3

“From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality https://twitter.com/Francis_YAO_/status/1783446286479286700

“Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated ‘…’ filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵 https://twitter.com/jacob_pfau/status/1783951795238441449

“One of the best ways to improve LLM performance is to ask it to “think aloud” (there are various techniques for doing this, including Chain of Thought). This also helps establish clearly the AIs plans. This paper suggests that, in some cases, the AI can plan without revealing it” / X – https://twitter.com/emollick/status/1784222322821382459

“When is LLMs’ #AlphaGOZero moment? Imagine #LLMs self-evolving without human supervision 🔥🔥🔥 Through #selfplay in an adversarial language game 🕹️, we observe continuous improvements in LLM reasoning 🚀. #AGI is getting closer! Check our paper at https://twitter.com/cheng_pengyu/status/1780965366531006887

“Open AI presents The Instruction Hierarchy Training LLMs to Prioritize Privileged Instructions Today’s LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model’s original instructions with their own malicious prompts. https://twitter.com/_akhaliq/status/1782607669376761989

LMSYS becoming less useful : r/LocalLLaMA – https://www.reddit.com/r/LocalLLaMA/comments/1c9nvpy/lmsys_becoming_less_useful/

“@deepwhitman @AIatMeta @lmsysorg no. people misunderstand chinchilla. chinchilla doesn’t tell you the point of convergence. it tells you the point of compute optimality. if all you care about is perplexity, for every FLOPs compute budget, how big model on how many tokens should you train? for reasons not fully…” / X – https://twitter.com/karpathy/status/1781033433336262691

[2404.12358] From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function – https://arxiv.org/abs/2404.12358

Self-Reasoning Tokens, teaching models to think ahead. – https://reasoning-tokens.ghost.io/reasoning-tokens/

PsyArXiv Preprints | Skill but not Effort Drive GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios – https://osf.io/preprints/psyarxiv/fzvd8

[2404.12753] AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation – https://arxiv.org/abs/2404.12753

Building reliable systems out of unreliable agents | Rainforest QA – https://www.rainforestqa.com/blog/building-reliable-systems-out-of-unreliable-agents

“SnapKV LLM Knows What You are Looking for Before Generation Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV https://twitter.com/_akhaliq/status/1782946902952034546

Rho-1: Not All Tokens Are What You Need. (A very efficient way to train SOTA models) : r/LocalLLaMA – https://www.reddit.com/r/LocalLLaMA/comments/1cb4wr7/rho1_not_all_tokens_are_what_you_need_a_very/

Smaller, Faster, Cheaper: Introducing Jina Rerankers Turbo and Tiny – https://jina.ai/news/smaller-faster-cheaper-jina-rerankers-turbo-and-tiny/

“If you want to extend the context length of your own models using PoSE, you can use @axolotl_ai . PR should get merged today. https://twitter.com/winglian/status/1783469196011016696

“This POSE (Positional Skip-wisE) technique proposed in this Paper is powering the context length increase of the LLAMA 3 upto 128k. And for extending the context length of your own models using PoSE, there’s a PR ( currently pending merge) in @axolotl_ai. (link in 1st comment)… https://twitter.com/rohanpaul_ai/status/1783574428858696161

MambaByte: Token-Free Language Modeling – YouTube – https://www.youtube.com/watch?v=kcd0BTKJuXk

“A KerasNLP starter notebook for the Automated Essay Scoring competition on Kaggle: https://twitter.com/fchollet/status/1783544742565015954
“XC-Cache Cross-Attending to Cached Context for Efficient LLM Inference In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient https://twitter.com/_akhaliq/status/1783554087574733294

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #30: Week Ending 04/26/2024 with Executive Summary and Top 39 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.