Tech Papers, Training, and Development: AI News Week Ending 02/07/2025

Tech Papers, Training, and Development: AI News Week Ending 02/07/2025

February 6, 2025

“Chain-of-Associated-Thoughts (CoAT) is a new framework that enhances LLMs’ reasoning abilities by combining Monte Carlo Tree Search with dynamic knowledge integration. The framework addresses the limitations of existing “fast thinking” approaches by introducing an “associative
https://x.com/omarsar0/status/1887187689247752370

“LIMO: Less is More for Reasoning Achieves 57.1% on AIME and 94.8% on MATH w/ only 817 training samples, i.e., only 1% of the training data required by previous approaches
https://x.com/arankomatsuzaki/status/1887353699644940456

AI Memory And Context: Open Source, DeepSeek, Meta, And Model Research
https://www.forbes.com/sites/johnwerner/2025/01/29/ai-memory-and-context-open-source-deepseek-meta-and-model-research/

“🤖🎯 Research Canvas ANA An AI research canvas that transforms complex research with human-guided LLMs. Built on LangGraph, it combines real-time search and an interactive canvas to streamline your research workflow. Learn more:
https://x.com/LangChainAI/status/1885357379396456684

“You will never guess the report that took Deep Research the longest to generate of any I made before. 30 pages, 10,600 words, and actually super interesting to read. (link to full document in reply)
https://x.com/emollick/status/1886558832681476359

“You: But I don’t know what to do with a super-smart AI capable of conducting graduate-level research. Me:
https://x.com/emollick/status/1886657674701353407

“Deep research is now rolled out to all pro users!” / X
https://x.com/markchen90/status/1886341752245915903

“🚢 This team SHIPS. Just today: – open-sourced Deep Research in 24h, – made 400k AI Spaces searchable, unlocking the app store of AI – ported the first foundational model on @LeRobotHF – showed w/@Adyen that LLMs still fail at real data analysis normal @huggingface Tuesday 😅” / X
https://x.com/fdaudens/status/1886889184423719117

“We’re just starting to understand how AI “think” during inference. Paper shows current large models often abandon promising lines of reasoning too quickly, “underthinking”. When they get math problems wrong, it’s often because they jumped away from correct approaches too early.
https://x.com/emollick/status/1885309072259297709

“Deeper Research with LLMs becomes more relevant. STROM or “Synthesis of Topic Outlines through Retrieval and Multi-perspective” is a paper that proposes a multi-question, iterative research, verifiable content generation, very similar to @GoogleDeepMind Gemini Deep Research and
https://x.com/_philschmid/status/1887085743131984029

(WIP) A Little Bit of Reinforcement Learning from Human Feedback
https://rlhfbook.com/c/11-policy-gradients.html

“i changed my mind. RL is easy
https://x.com/andersonbcdefg/status/1886319033949262245

“looks like RL is the future. stop grinding leetcode, kiddo. start grinding cartpole-v1” / X
https://x.com/andersonbcdefg/status/1885222307788185725

“Interesting talk/slides on a method by which AIs can self-improve and generalize.” / X
https://x.com/emollick/status/1886095671041609960

“Looks like a ~60-70% VRAM reduction change to GRPO in TRL is coming soon!
https://x.com/nrehiew_/status/1885184764539273574

“Reasoners break the value of chain-of-thought as a prompting technique and often lower the value of giving AI a persona No new approaches to prompting reasoners have been tested in any sort of rigorous way (though people often share their own ” best practices, they may not work)” / X
https://x.com/emollick/status/1886162428766879790

“Seen some misconceptions that GRPO is the process used to RL R1 on verifiable rewards. Thought its worth clarifying. GRPO gets rid of the Value Model and NOT the Reward Model. This is the main insight since you save memory. The main change between PPO and GRPO is the way the
https://x.com/nrehiew_/status/1885079616248832090

“Excited to share this work with a bunch of controlled experiments to better understand long chains of thought! Some insights on: 1. Short CoT vs. long CoT 2. The role of SFT vs RL 3. How to tweak reward to manage length 4. Measuring “branching” behavior etc etc” / X
https://x.com/gneubig/status/1887495037820567815

“LIMO Less is More for Reasoning
https://x.com/_akhaliq/status/1887372529112686810

“Token Assorted Mixing Latent and Text Tokens for Improved Language Model Reasoning
https://x.com/_akhaliq/status/1887373223152492665

“Announcing How Transformer LLMs Work, created with @JayAlammar and @MaartenGr, co-authors of the beautifully illustrated book, “Hands-On Large Language Models.” This course offers a deep dive into the inner workings of the transformer architecture that powers large language
https://x.com/AndrewYNg/status/1887184924165492940

“Incredibly excited to introduce to you the course @JayAlammar and I have been working on! We love creating visuals and this course is no exception. Expect many animations and illustrations detailing the ins and outs of LLMs. Working together with @AndrewYNg has been an honor!” / X
https://x.com/MaartenGr/status/1887192134937190624

“💭🔎 Introducing EvalPlanner – a method to train a Thinking-LLM-as-a-Judge that learns to generate planning & reasoning CoTs for evaluation. Strong performance on RewardBench, RM-Bench, JudgeBench & FollowBenchEval. Paper 📄:
https://x.com/jaseweston/status/1885153770662760472

Interaction Processing Units
https://nilscrm.github.io/ipu.html

MonST3R: A Simple Approach for Estimating Geometry in the Presense of Motion
https://monst3r-project.github.io/

[2501.19400v1] Vintix: Action Model via In-Context Reinforcement Learning
https://arxiv.org/abs/2501.19400v1

[2502.02390] CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning
https://arxiv.org/abs/2502.02390

“s1: Simple test-time scaling “We show that training on only 1,000 samples with next-token prediction and controlling thinking duration via a simple test-time technique we refer to as budget forcing leads to a strong reasoning model that scales in performance with more test-time
https://x.com/iScienceLuvr/status/1886249466203910418

“The paper introduces CodeMonkeys, a system designed to improve Large Language Model (LLM) performance in solving software engineering tasks by efficiently scaling test-time computation. Amortized context, parallel attempts CodeMonkeys uses iterative refinement and parallel
https://x.com/rohanpaul_ai/status/1885509454466343273

“This work was realized in the context of @tom_labiausse’s Master internship, and he will keep working on Hibiki as a resident PhD student at Kyutai. This project was funded by @GroupeIliad , @cmacgm and @schmidtsciences. We thank them for their support. Paper:” / X
https://x.com/kyutai_labs/status/1887495511474573517

“This paper introduces In-Context Reinforcement Learning (ICRL). It demonstrates that a transformer LLM, when trained with RL, can learn to solve new problems it has never seen before, by leveraging in-context experience. —— 📌 ICRL Enables On-the-Fly Adaptation Traditional
https://x.com/rohanpaul_ai/status/1886748811613364470

“💀 Introducing RIP: Rejecting Instruction Preferences💀 A method to *curate* high quality data, or *create* high quality synthetic data. Large performance gains across benchmarks (AlpacaEval2, Arena-Hard, WildBench). Paper 📄:
https://x.com/jaseweston/status/1885160135053459934

[2502.01628] Harmonic Loss Trains Interpretable AI Models
https://arxiv.org/abs/2502.01628

stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
https://github.com/stanford-oval/storm

“The paper introduces CLoQ, a calibration-based LoRA initialization for quantized Large Language Models. CLoQ minimizes the discrepancy between original and quantized LLMs by optimizing LoRA components, improving fine-tuning performance, especially at low bit-widths. ——–
https://x.com/rohanpaul_ai/status/1886742554630349059

“🚨 Diverse Preference Optimization (DivPO) 🚨 SOTA LLMs have model collapse🫠: they can’t generate diverse creative writing or synthetic data 🎨 DivPO trains for both high reward & diversity, vastly improving variety with similar quality. Paper 📝:
https://x.com/jaseweston/status/1885399530419450257

“-> RL improves models’ adaptability to new tasks -> SFT leads to memorization but remains important for model stabilization That’s what researchers from @GoogleDeepMind, @nyuniversity, @UCBerkeley @HKUniversity found, exploring how actually Reinforcement Learning (RL) and
https://x.com/TheTuringPost/status/1886465061763604844

“I’d like to see a benchmark for test-time scaling that evaluates methods on how well they can do with only *1K* examples.
https://x.com/percyliang/status/1886490467497553944

[2502.03382] High-Fidelity Simultaneous Speech-To-Speech Translation
https://arxiv.org/abs/2502.03382

“some folks were asking how to run gsm8k “test” eval during GRPO training, one way to do it is by extending the GRPOTrainer and writing a custom trainer to do the eval every N steps. check the gist I wrote below to see what I mean
https://x.com/abacaj/status/1887206493700645300

Paper page – Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
https://huggingface.co/papers/2402.14207

[2501.03936v1] PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
https://arxiv.org/abs/2501.03936v1

“→ Mixture-of-Agents systems use multiple different LLMs to improve output. → However, balancing the quality and diversity of these LLMs is hard. Including weaker models can reduce overall performance. This paper proposes Self-MoA which aggregates outputs from a single,
https://x.com/rohanpaul_ai/status/1887494487460716866

[2501.19383v1] Decoding-based Regression
https://arxiv.org/abs/2501.19383v1

“How biased are LLMs when you use them for synthethic data generation and as LLM as a Judge to evaluate? Answer: Significantly biased. 👀 The “Preference Leakage: A Contamination Problem in LLM-as-a-judge” paper shows that using the same LLM, Family or even previous version can
https://x.com/_philschmid/status/1886717030218297406

[2501.19201v1] Efficient Reasoning with Hidden Thinking
https://arxiv.org/abs/2501.19201v1

(3) Customers don’t care about your AI feature
https://www.growthunhinged.com/p/ai-messaging-study

[2501.18965] The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
https://arxiv.org/abs/2501.18965

“@swyx @GoogleDeepMind I think you need to use another benchmark to really appreciate the capabilities of Pro relative to other models, its world class for deep coding problems” / X
https://x.com/OfficialLoganK/status/1887269355919917182

[2501.19393] s1: Simple test-time scaling
https://arxiv.org/abs/2501.19393

[2412.16247] Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
https://arxiv.org/abs/2412.16247

“Everyone wants to teach “AI skills” but nobody knows what “AI skills” are as of today, let alone for the future. We can teach a bit about how LLMs work, and give some advice on prompting, but, beyond that, what are people supposed to learn to make them “AI ready”? No clear vision” / X
https://x.com/emollick/status/1885054259860824131

“New 3h31m video on YouTube: “Deep Dive into LLMs like ChatGPT” This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental
https://x.com/karpathy/status/1887211193099825254

“We’re ecstatic to bring you “How Transformer LLMs Work” — a free course with ~90 minutes of video, code, and crisp visuals and animations that explain the modern Transformer architecture, tokenizers, embeddings, and mixture-of-expert models. @MaartenGr and I have developed a” / X
https://x.com/JayAlammar/status/1887189786672202233

“The paper introduces DINT Transformer to enhance LLM attention mechanisms. It addresses limitations of DIFF Transformer by incorporating global context and ensuring numerical stability. This leads to improved performance in long-context tasks and key information retrieval. 📌
https://x.com/rohanpaul_ai/status/1887135073130172669