Tech Papers, Training, and Development: AI News Week Ending 01/30/2025

Tech Papers, Training, and Development: AI News Week Ending 01/30/2025

January 30, 2025

“”Move 37″ is the word-of-day – it’s when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only” / X
https://x.com/karpathy/status/1884336943321997800

“It’s also a testimony that quant companies have so many top talents wasted on $number games. Magic happens if an army of well-disciplined HFT engineers are determined to maximize MMLU, Chatbot Arena, and tokens/sec instead.” / X
https://x.com/DrJimFan/status/1882778944543617233

YuE
https://map-yue.github.io/

“@ggerganov @Apple My guess is this is related to memory getting unwired after a certain amount of time. Rewiring memory is a relatively expensive operation and will scale with model size. If you are on macOS 15+ you could try setting `sudo sysctl iogpu.disable_wired_collector=1` to test that” / X
https://x.com/awnihannun/status/1882821315264164118

“That moment of suspense when you watch your GPU RAM spike and pray you optimized those training parameters correctly… 🙏
https://x.com/fdaudens/status/1882908439191699559

“Eagle 2, an amazing release from my previous NV mentor @ZhidingYu. Great insights on VLM post-training and data strategies, congrats!” / X
https://x.com/zizhpan/status/1884179344588955751

2501.13452
https://arxiv.org/pdf/2501.13452

“people surprised at V3’s training efficiency show they haven’t been following the Whale saga. With V2, they proved they could train a 236B model 42% faster than their [dense] 67B. V3 drops down to fp8 etc and maintains that <200KH/1T speed for a 3x bigger model. They won’t stop.
https://x.com/teortaxesTex/status/1883976116949639568

“Lesson here is that investors do not understand that the paradigm for AI has been undergoing a shift from one which was about models getting smarter due to more computing power being used for training to models getting smarter due to more computing power being used for inference.” / X
https://x.com/emollick/status/1883937071393562960

“また、完全なローカル実行を希望される方向けに、モデルの重みを含むself-contained版のウェブアプリも用意しています。 GitHub:
https://x.com/SakanaAILabs/status/1884880970790343001

syncanimation.github.io
https://syncanimation.github.io/

codename goose | codename goose
https://block.github.io/goose/

“🚀 New in LangSmith: Bulk View for Annotation Queues When working with large datasets for model training, managing thousands of annotations can be overwhelming. In LangSmith, you can now: • View multiple annotation runs at once for a high-level overview of the queue • Quickly
https://x.com/LangChainAI/status/1885003940661743999

“Machines will train machines. Never bet against scaling. Never.” / X
https://x.com/DrJimFan/status/1883961057074634885

“I think “RL” does not mean anything anymore tbh.” / X
https://x.com/francoisfleuret/status/1884327414060507565

“my tech stack is not decided by performance, closeness to machine code, or anything smart, it’s all about comfiness jax? comfy nix? comfy tensorboard? comfy jupyter notebooks? mega comfy i want my work to feel comfy” / X
https://x.com/qtnx_/status/1884258354757042321

“I also heard they divide every number by two so it uses fewer cycles later on during the matmuls” / X
https://x.com/giffmana/status/1883661880822284792

“I’ve been arguing for something like this for over a year: crowdsourced distributed fine-tuning.” / X
https://x.com/ylecun/status/1884692313118495221

“In The Coming Wave I argued a core attribute of technology is its tendency get cheaper, more efficient + diffuse ever wider. It’s a deeply entrenched historical pattern, it’s shaped our world at a profound level + it’s still going today…this is what we’re seeing. At warp speed” / X
https://x.com/mustafasuleyman/status/1885042373757198811

“@karpathy > “cognitive strategies” – things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc A few years from now we will look back and think how crazy it was that we were manually doing things like CoT,” / X
https://x.com/omarsar0/status/1884339091401211938

“here’s a good thread about Implicit CoT from its author:
https://x.com/jxmnop/status/1882830393373774310

“See the full leaderboard and try the models yourself:
https://x.com/lmarena_ai/status/1882875995503636640