X: AI News Week Ending 02/21/2025

Improvements will happen rapidly and almost daily according to the team. There is also a Grok-powered voice app coming too — about a week away! / X https://x.com/omarsar0/status/1891715813956108699

No system card for Grok 3 yet, so no perspectives on risk mitigation. This is especially key for voice, and is why labs have been slow with full multimodal, you can imitate anyone’s voice, and also the AI tended to take your voice and repeat it back to you. From 4o system card: https://x.com/emollick/status/1891745345392058496

BREAKING: Grok’s voice unveiled https://x.com/teslaownersSV/status/1891719294469222495

“LLMs are still incredibly bad at long context, severe drop in response quality from the best of the best models (o1, Claude, grok, DeepSeek), doesn’t really matter what model – it will choke” / X https://x.com/abacaj/status/1893024046469493212

“I can’t believe X users are so stupid. Not voting for o3-mini is insane. You can literally already distill 4o, Claude 3.5, Deepseekv3, etc into sizes that will run on phones.” / X https://x.com/dylan522p/status/1891682135255154775

“After the Grok-3 launch you have to consider xAI as a real competitor for SOTA models. Everything else is just cope. However, internally OpenAI, Anthropic and Google are likely ahead. But honestly not so sure about Google anymore, they need to drop a banger (Pro/Ultra with” / X https://x.com/scaling01/status/1891846484791820502

“BREAKING: Grok’s voice unveiled https://x.com/teslaownersSV/status/1891719294469222495

“Improvements will happen rapidly and almost daily according to the team. There is also a Grok-powered voice app coming too — about a week away!” / X https://x.com/omarsar0/status/1891715813956108699

“No system card for Grok 3 yet, so no perspectives on risk mitigation. This is especially key for voice, and is why labs have been slow with full multimodal, you can imitate anyone’s voice, and also the AI tended to take your voice and repeat it back to you. From 4o system card: https://x.com/emollick/status/1891745345392058496

OpenAI Rejects Elon Musk’s $97.4 Billion Bid for Control of the Company – The New York Times https://www.nytimes.com/2025/02/14/technology/openai-elon-musk.html

“Grok 3 drops tomorrow night—xAI’s billion-dollar bet on scaling. Reminder: xAI built Colossus, the world’s most powerful AI training cluster (100,000+ NVIDIA H100s in just 122 days) to train Grok 3. This comes after DeepSeek-R1 tanked the stock market by delivering a strong https://x.com/rowancheung/status/1891151253951987737

“Based on the early stats, looks like Grok 3 base is going to be a very solid frontier model (leads Chatbot Arena), suggesting pre-training scaling law continues with linear improvements to 10x compute No Reasoner, yet (one is coming?) so GPQA scores are still below o3-mini (77%) https://x.com/emollick/status/1891707120879345788

“The significance of Grok 3, outside of X drama, is that it is the first full model release that we definitely know is at least an order of magnitude larger than GPT-4 class models in training compute, so it will help us understand whether 1st scaling law (pre-training) holds up.” / X https://x.com/emollick/status/1890982179355639881

“I think Grok 3 came in right at expectations, so I don’t think there is much to update in terms of consensus projections on AI: still accelerating development, speed is a moat, compute still matters, no obvious secret sauce to making a frontier model if you have talent & chips.” / X https://x.com/emollick/status/1891749764212900242

“Grok 3 also excels at creative coding like generating creative and novel games. Elon emphasized Grok 3’s creative emergent capabilities. You can also use the Big Brain mode to use more compute and reasoning with Grok 3. https://x.com/omarsar0/status/1891709371802910967

“the grok 3 release made me sad. something fatalistic about falling back to bruteforce scaling — 100x more compute than R1 for a model that’s at most 10% better all that time, money, and electricity spent on a system that will be obsolete before my semester ends AI needs new” / X https://x.com/jxmnop/status/1892725541796446350

“grok-3 is 8e26 FLOPs of training compute” / X https://x.com/ethanCaballero/status/1891712442893312151

BREAKING: Elon Musk just announced xAI’s Gaming Studio. https://x.com/cb_doge/status/1891710861791629406

@elonmusk @DavidSHolz caught elon tweeting during grok3 Q&A 🤣 https://x.com/Yuchenj_UW/status/1891713223591629143

“AI NEWS: Elon Musk’s xAI just unveiled Grok-3 and ranked #1 on the Chatbot Arena. Plus, more news from Mistral’s new regional AI Saba, Ilya’s SSI, Nous Research, and a new open-source Chinese video model. Here’s what you need to know:” / X https://x.com/rowancheung/status/1891773915560583258

“Trying Deep Research with Grok 3. It shows promise, but not there yet. Generally accurate (I spotted a minor hallucination), but not as comprehensive as Google Deep Research nor close to as insightful as OpenAI’s Deep Research in the actual analysis of information. Early days. https://x.com/emollick/status/1892010991250018357

“In the Coding category, Grok-3 surpassed top reasoning models like o1 and Gemini-thinking. https://x.com/lmarena_ai/status/1891706272711381237

“Here are the benchmark numbers: Grok 3 significantly outperforms other models in its category such as Gemini 2 Pro and GPT-4o. Even Grok-3 mini shows to be competitive. https://x.com/omarsar0/status/1891706611023938046

“Is it just me or are the bots back with a vengeance on X? First person to like any of my recent post is a zero follower account with an OF in the bio lol” / X https://x.com/bilawalsidhu/status/1890114989576671332

“I did not get early access, but based on a half-dozen queries this seems right. A very good model that is now at the frontier, but not something that would make you switch from another AI yet The key thing to pay attention to is that X got here very fast & whether that continues” / X https://x.com/emollick/status/1891723774774374400

“This is the one social media site where you can ask an AI in a click about any odd claim and get a reasonably nuanced answer complete with web search results from an AI… and the information environment here is so, so bad. Nobody checks anything. Was true before AI, true now.” / X https://x.com/emollick/status/1891384099199144440

Grok3 Launch / X https://x.com/i/broadcasts/1gqGvjeBljOGB

“A real-world example of replacing Django Admin with 142 lines of Python/fasthtml/monsterui 😀 https://x.com/jeremyphoward/status/1892878733582700781

“Grok3 Unveiled https://x.com/Teknium1/status/1891705665007050851

“SigLIP 2 comes in three sizes (base, large, giant), three patch sizes (14, 16, 32) and shape-optimized variants with Naflex 💘 As usual, supported by transformers from get go! Models: https://x.com/mervenoyann/status/1892870394861789535

“Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.” / X https://x.com/elonmusk/status/1890958798841389499

“Grok 3 reasoning beta achieved 96 on AIME and 85 on GPQA, which is on par with the full o3. https://x.com/arankomatsuzaki/status/1891708250199839167

“Grok 3 is a new best model in the world from the @xai team! Grok 3 ranks #1 on Chatbot Arena w/a big gap, and scores impressively on pretraining and reasoning evals. congrats to @elonmusk @ibab @jimmybajimmyba @Yuhu_ai_ looking forward to more partnership on grok4 & beyond 🚀 https://x.com/alexandr_wang/status/1891714169629524126

“BREAKING: xAI announces Grok 3 Here is everything you need to know: https://x.com/omarsar0/status/1891705029083512934

“i used up my whole grok quota on “glub”. this fucking sucks https://x.com/andersonbcdefg/status/1892828532515991786

“i love the janitor, but just accept that Grok-3 is the most powerful PUBLICLY AVAILABLE LLM (at least for a day lol) Look at the condition of the bet. Grok-3 delivered. https://x.com/scaling01/status/1891842735834808708

“Grok 3 involved 10x more training than Grok 2! Grok finished pretraining in early January! The model is still training. https://x.com/omarsar0/status/1891705957220016403

“Elon mentioned that Grok 3 is an order of magnitude more capable than Grok 2. https://x.com/omarsar0/status/1891705031243469270

“Another question that might be answered by Grok 3 – there is some evidence that the impact of AI on productivity is driven by model scale as well, will it continue? (There are lots of reasons, though, that Grok 3 may not be as useful for work as other AI models, so we shall see)” / X https://x.com/emollick/status/1891359072546385995

“BREAKING: @xAI early version of Grok-3 (codename “chocolate”) is now #1 in Arena! 🏆 Grok-3 is: – First-ever model to break 1400 score! – #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵 https://x.com/lmarena_ai/status/1891706264800936307

“Grok-3 without reasoning actually looks pretty good on these 3 cherry picked benchmarks. It’s also a good sign that they got 1400 Elo n lmsys from the get go. However, I feel like this launch was rather underwhelming. Too few benchmarks, no report and no useful demos. If it’s https://x.com/scaling01/status/1891786871304323280

“Grok 3 on X Premium+ https://x.com/omarsar0/status/1891715441292083572

“Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.” / X https://x.com/elonmusk/status/1890958798841389499

“a man died to tell us how good grok 3 really is never forget https://x.com/aidan_mclau/status/1891243031090626776

Elon Musk’s Grok 3: Performance, How to Access, and More https://www.analyticsvidhya.com/blog/2025/02/grok-3/

“This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt). Try Grok 3 now: https://x.com/xai/status/1892400129719611567

Elon Musk says xAI’s Grok 3 chatbot to be unveiled on Monday | Reuters https://www.reuters.com/technology/artificial-intelligence/elon-musk-says-xais-grok-3-chatbot-be-unveiled-monday-2025-02-16/

“Grok 3 https://x.com/emollick/status/1891751665948045758

“Grok 3 Reasoning Beta performance on AIME 2025. Grok 3 shows generalization capabilities. It not only does coding and math problem-solving, but it can also do other creative and useful real-world tasks. https://x.com/omarsar0/status/1891711110476111884

“Grok 3 performance on AIME 2025 (math competition that just finished a few days ago!) https://x.com/iScienceLuvr/status/1891708408832610548

“I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check. Thinking ✅ First, Grok 3 clearly has an around state of the art thinking model (“Think” button) and did great out of the box on my Settler’s of Catan https://x.com/karpathy/status/1891720635363254772

“Another thing Grok 3 highlights is the urgent need for better batteries of tests and independent testing authorities. Public benchmarks are both “meh” and saturated, leaving a lot of AI testing to be like food reviews, based on taste. If AI is critical to to work, we need more.” / X https://x.com/emollick/status/1891862982558187560

Grok 3, xAI’s New Model Family, Improves on its Predecessors, Adds Reasoning https://www.deeplearning.ai/the-batch/grok-3-xais-new-model-family-improves-on-its-predecessors-adds-reasoning/

“Reasoning models like Grok-3 reasoning beta and DeepSeek-R1 are trained using reinforcement learning with verifiable rewards, but what exactly does this mean? Verifiable tasks. One detail that we should immediately notice about reasoning models is that they are primarily used https://x.com/cwolferesearch/status/1891893034956030242

“Grok-3 should be open-sourced. @elonmusk @xai” / X https://x.com/huybery/status/1891712667947057598

“Grok 3 also has reasoning capabilities too! The Grok team has been testing these capabilities which they have unlocked using RL. The model is good, especially in coding. https://x.com/omarsar0/status/1891707915351859547

“If the light blue part is best of N scores, this means that Grok 3 reasoning is inherently an ~o1 level model. This means the capabilities gap between OpenAI and xAI is ~9 months. Also what is the difference between “think” and “big brain” https://x.com/nrehiew_/status/1891710589115715847

“Based on the announcement (& not using the model, yet): 1) X has caught up with the frontier of released models VERY quickly, if they continue to scale this fast, they are a major player 2) Grok 3 is closely following the OpenAI playbook 3) Not sure who will use API at this point” / X https://x.com/emollick/status/1891714787022639373

“xAI arrives at the frontier: Grok 3 is poised to be the world’s new leading model, likely only surpassed by OpenAI’s unreleased o3 model Key takeaways: ➤ Grok 3 is now the leading non-reasoning model, pushing pre-training to new limits ➤ Grok 3 Reasoning likely beats o3-mini https://x.com/ArtificialAnlys/status/1891853619907133702

“In my testing it was at least as good in thinking mode then o3-full deep research was, despite that not being listed here – Interesting to note that grok-3mini seems generally better than full, my guess is that this means they didnt distill full into mini like I assume OpenAI https://x.com/Teknium1/status/1891715974992408738

“small if true with 60k GPUs I can see Grok 3 actually failing to secure primacy even for a day” / X https://x.com/teortaxesTex/status/1891491103674626177

“Grok 3 reasoning models (in beta, still in further training) are better than o3-mini-high, o1, DeepSeek R1 based on preliminary benchmarks https://x.com/iScienceLuvr/status/1891708045324902499

“Grok 3 mini is amazing. We’ll release it soon.” / X https://x.com/ibab/status/1891761914688254340

“Grok 3. The shader worked the first time around, same prompt as o3-mini-high https://x.com/emollick/status/1891956902575104259

“One of the results generated with Grok 3 mini. https://x.com/omarsar0/status/1891711669849505864

“Grok 3 release is a huge event. We’re about to figure out what a cracked team can ship with a significantly bigger cluster than GPT-4 was trained on. Get HYPE 🗣️” / X https://x.com/marktenenholtz/status/1891544044016173475