International: AI News Week Ending 01/24/2025

International: AI News Week Ending 01/24/2025

January 23, 2025

“Mistral released Codestral 25.01, a 2x faster, lightweight coding AI that achieves high performance across 80+ programming languages It supports tasks like code correction and test generation and is currently ranked #1 on the Copilot Arena leaderboard
https://x.com/adcock_brett/status/1881024808609100106

“Chinese lab Minimax launched two AI models with context windows of 4M tokens Using a new ‘Lightning Attention’ approach, the models perform comparably with top models on academic benchmarks They mark a push toward agents with extensive memory
https://x.com/adcock_brett/status/1881024786190463073

“🚨 NEW: Marc Andreessen on China’s manufacturing dominance “There’s three industries that follow phones that the Chinese own the global market at: 1) Drones Something over 90% of all the consumer drones are made in China. Which is what the US Military also uses. It’s the whole
https://x.com/AutismCapital/status/1879396919107313836

“DeepSeek showed us in just 4 days: – Open-source AI is only <6 months behind closed AI – China is leading the open-source AI race (was not on my bingo card) – we are entering the LLM RL golden era – distilled models are powerful, we’ll have highly intelligent AI running locally” / X
https://x.com/Yuchenj_UW/status/1882840436974428362

DeepSeek on X: “🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today! 🐋 1/n https://t.co/7BlpWAPu6y” / X – https://x.com/deepseek_ai/status/1881318130334814301
“DeepSeek’s first-generation reasoning models are achieving performance comparable to OpenAI’s o1 across math, code, and reasoning tasks! Give it a try! 👇 7B distilled: ollama run deepseek-r1:7b More distilled sizes are available. 🧵

https://x.com/ollama/status/1881427522002506009
(2) DeepSeek R1’s recipe to replicate o1 and the future of reasoning LMs

https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1
“Reinforcement Learning is all you need! @deepseek_ai R1 an open model that rivals @OpenAI o1 and other models on complex reasoning tasks just got released. But how is it trained? 👀 DeepSeek combines reinforcement learning with multi-stage training to achieve reasoning abilities

https://x.com/_philschmid/status/1881420703721009192
“🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at

https://x.com/deepseek_ai/status/1881318130334814301
Mark Lord on X: “PSA: It takes <2 minutes to set up R1 as a free+offline coding assistant 💁‍♀️ Big shout out to @lmstudio and @continuedev! 🫶 https://t.co/ThdxcTF9e4” / X – https://x.com/priontific/status/1881668130470285379

“R1 Cold Start → R1 Reasoner with RL (Stage 2/4) 🚀Train Stage 1 model with GRPO: Use data from stage 0 and add a language consistency rule (target lang % in CoT). 💡Emergent: readable reasoning with reflection + long CoT.
https://x.com/casper_hansen_/status/1881404614190506188

Mark Lord on X: “PSA: It takes <2 minutes to set up R1 as a free+offline coding assistant 💁‍♀️ Big shout out to @lmstudio and @continuedev! 🫶 https://t.co/ThdxcTF9e4” / X – https://x.com/priontific/status/1881668130470285379
“R1 Reasoning → R1 Finetuned-Reasoner (Stage 3/4) 🚀Generate 600k: multi-response sampling and only keep correct samples (using prev rules) ⚙️V3 as a judge: filter out mixed languages, long paragraphs, and code 🌐Generate 200k general-purpose samples via V3 🔥Finetune model” / X

https://x.com/casper_hansen_/status/1881404617235509711
“R1 Zero → R1 Finetuned Cold Start (Stage 1/4) 🚀Generate 1-10k long CoT samples: Use R1 Zero with few-shot prompting ⚙️Supervised finetuning using model from stage 0 💡Result: Readable thoughts + structured outputs.

https://x.com/casper_hansen_/status/1881404611401236745
maharshi on X: “deepseek R1 thinks for around 75 seconds and successfully solves this cipher text problem from openai’s o1 blog post. https://t.co/nI3vzwysH2” / X – https://x.com/mrsiipa/status/1881330071874813963

“Wow, DeepSeek R1 Distill Qwen 7B (in 4-bit) nailed the first hard math question I asked it. Thought for ~3200 tokens in about 35 seconds on M4 Max with mlx-lm.
https://x.com/awnihannun/status/1881386796266946743

(2) Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang)
https://www.latent.space/p/baseten

“📜 License Update! 🔄 DeepSeek-R1 is now MIT licensed for clear open access 🔓 Open for the community to leverage model weights & outputs 🛠️ API outputs can now be used for fine-tuning & distillation 🐋 3/n” / X
https://x.com/deepseek_ai/status/1881318138937233664?s=46

“Here’s my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL
https://x.com/SirrahChan/status/1881488738473357753

“I think it’s chilling for Dario, who’s good with numbers, to know inside that DeepSeek did mog him with a 2K H800s cluster and some P20s+Ascends. The math in the papers checks out. Nobody can tell what the fuck those mythical 50K H100s are up to. But if they stop being mythical…” / X
https://x.com/teortaxesTex/status/1882222592800739546

“DeepSeek R1 has landed on HuggingChat! (DeepSeek-R1-Distill-Qwen-32B version)
https://x.com/fdaudens/status/1881737288066961567

“Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: 1. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM, no MCTS no fancy reward” / X
https://x.com/AlexGDimakis/status/1881511481164079507

How has DeepSeek improved the Transformer architecture? | Epoch AI
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

“R1 Instruct-Reasoner → R1 Aligned (Stage 4/4) ⚖️Align DeepSeek-R1: Balance reasoning with helpfulness and harmlessness using GRPO 🔍 Data Strategy: rule-based rewards for math/code + reward model for human preferences. 🌟Result: DeepSeek R1
https://x.com/casper_hansen_/status/1881404619362013294

“We retrained hermes with 5k deepseek r1 distilled cots. I can confirm a few things: 1. You can have a generalist + reasoning mode, we labeled all longCoT samples from r1 with a static systeem prompt, the model when not using it does normal fast LLM intuitive responses, and with,” / X
https://x.com/Teknium1/status/1882893748742598669

“I think academics do need to start writing for AI. Here is a bit of the internal monologue of DeepSeek R1 when I asked it come up with a sociological theory (the details don’t matter). It came up with a name and then realized it was already a name for a theory proposed by March
https://x.com/emollick/status/1881545492712316950

“The raw chain of thought from DeepSeek is fascinating, really reads like a human thinking out loud. Charming and strange.
https://x.com/emollick/status/1881423029160575474

“No matter how much you fight it, I find that the visible chain-of-thought from DeepSeek makes it nearly impossible to avoid anthropomorphizing the thing. The visible first-person “thinking” makes you feel like you are reading a diary of a somewhat tortured soul who wants to help
https://x.com/emollick/status/1881904723026210985

deepseek-ai/DeepSeek-R1 · Hugging Face
https://huggingface.co/deepseek-ai/DeepSeek-R1

“The release of DeepSeek-R1 demonstrates that, for better or worse, any attempt to restrict access to AI by governments is unlikely to work. You can get an open frontier model on a USB stick, and the methods outlined by DeepSeek suggest pathways forward for other open models, too.” / X
https://x.com/emollick/status/1881405036926001580

“DeepSeek is a side project 🔥
https://x.com/hardmaru/status/1882698763988545808

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1

Click to access DeepSeek_R1.pdf

“I asked #R1 to visually explain to me the Pythagorean theorem. This was done in one shot with no errors in less than 30 seconds. Wrap it up, its over: #DeepSeek #R1
https://x.com/christiancooper/status/1881335734256492605

“Summary of the DeepSeek models released today! DeepSeek-R1-Zero > Base Model: DeepSeek-V3-Base > Training Approach: Pure reinforcement learning (RL) without any supervised fine-tuning (SFT) as a preliminary step > RL Algorithm: Group Relative Policy Optimization (GRPO), which
https://x.com/reach_vb/status/1881412831306002897