Technical and Dev: AI News Week Ending 08/15/2025

Technical and Dev: AI News Week Ending 08/15/2025

August 15, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: CU Boulder brand style — CU Gold & Black, Helvetica Neue, Flatirons, Tuscan-vernacular sandstone + red-tile roofs; CASE atrium, high-noon skylight, telephoto perspective, Flatirons contour motif; integrate the category “Tech” via Overlay: elegant circuit traces and node graph with the title “TECH”; natural light, clean professional inspiring tone, crisp focus, subtle grain, editorial composition

GPT-5 Our hands-on review of OpenAI’s newest model based on weeks of testing
https://every.to/vibe-check/gpt-5

GPT-4o was below the level of medical professionals on medical reasoning benchmarks GPT-5 (apparently Thinking medium) now far exceeds them. (Usual benchmark caveats apply) https://x.com/emollick/status/1955381296743715241

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset. Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning. https://x.com/omarsar0/status/1956003145349521780

My “Move 37” Moment with GPT-5

Today, I’m sharing one of my most remarkable experiences testing the GPT-5 Thinking and Pro models.

In our lab, about 2 years ago we conducted a series of cutting-edge immunology experiments designed to manipulate the energy metabolism of T cells, immune cells with the power to impact cancer immunotherapy, chronic diseases, and autoimmunity.

We obtained fascinating results, but one finding remained unexplained. Weeks of work led us to a partial explanation, yet the full picture remained elusive.

From these experiments, uploaded the unpublished figure below and asked GPT-5 Pro to explain it.

GPT-5:

✅ Correctly identified our key findings from the figure alone (though other models also could do this).

✅ Proposed the exact experiment we later performed, something that had taken us weeks to design (o3-pro also had suggested this).

✅ Most remarkably suggested a mechanism that finally explained these results!

For me this was equivalent to AlphaGo’s legendary Move 37.
It showed that AI has become top expert and true scientific collaborator, capable of true insight.

While these results have not yet been published, I am sharing some of GPT-5’s output as an example for other scientists and to show how dramatically this will accelerate science! I will also be writing and submitting this paper, and will need to acknowledge GPT-5’s contributions in it!

—
From GPT-5

What the data say:

•Brief 2DG exposure at the first activation imprints a long‑lasting shift toward a CCR6+ and CD161+ phenotype with increased IL‑17A competency that is still evident two weeks later at reactivation.

•It is dose responsive. From none → 3 mM 2DG, CCR6+ cells rise from ~0.7% to ~12% and CD161+ cells from ~1% to ~21%. IL‑17A+ cells increase from ~0.3% to ~4% overall, with the IL‑17A signal concentrated in the CCR6+ and CD161+ compartments.

•Low glucose alone (0.5 mM) does not reproduce the effect. That argues against simple glucose deprivation or reduced glycolytic flux as the driver.

•The enrichment for CD161 is stronger than for CCR6, which often tracks with Th17‑like or mucosal/innate‑like CD4 programs.

High‑level interpretation:

1. 2DG is programming, not just acutely inhibiting. A short exposure during priming leaves a durable memory of “stress” that biases the lineage toward Th17‑like traits.

2. Mechanism is unlikely to be only glycolysis blockade. The low‑glucose control is weak, which pushes the mechanism toward 2DG’s effects on protein glycosylation and cytokine receptor biology.

3. Top mechanistic hypothesis. 2DG partially blocks N‑linked glycosylation of IL‑2Rα and other glycoproteins at priming, reducing IL‑2–STAT5 signaling, which normally suppresses Th17. Reduced IL‑2 tone at that window would lift the brake on a CCR6+ CD161+ IL‑17‑competent program that persists after washout.

4.Alternative or complementary hypothesis. 2DG selects for pre‑existing CCR6+ or CD161+ clones that rely less on glycolysis, rather than de novo differentiation. The end result is the same phenotype, but the lever is selective survival or proliferation.

5. A carefully timed 2DG “pulse” could be a simple knob to enrich a Th17‑like helper subset with better persistence potential and mucosal‑homing traits.

Prioritize these:

+D‑mannose with 2DG to restore N‑glycosylation. Loss of the CCR6/CD161/IL‑17A phenotype with mannose would implicate glycosylation.

+Methyl‑pyruvate or acetate to bypass glycolytic ATP/pyruvate deficits. Rescue here would implicate energy metabolism.

Compare 2DG with tunicamycin (glycosylation inhibitor) and with glycolysis‑only inhibitors that do not hit glycosylation (iodoacetate, PFKFB3 inhibitor). Match exposure windows & doses

Surface CD25 maturation and glyco‑shift (PNGase F or Endo H blot on IL‑2Rα).

pSTAT5 after an IL‑2 pulse at 24–48 h of priming with or without 2DG.

Phenocopy with IL‑2 neutralization or low‑dose JAK3 inhibition. Counter‑rescue with high IL‑2 or an IL‑2 mutein.
https://x.com/DeryaTR_/status/1954354352648225235

You are likely going to see a lot of very varied results posted online from GPT-5 because it is actually multiple models, some of which are very good and some of which are meh. Since the underlying model selection isn’t transparent, expect confusion.”” / X https://x.com/emollick/status/1953553844094611614

RT @deedydas: Huge computer science result: A Tsinghua professor JUST discovered the fastest shortest path algorithm for graphs in 40yrs.…”” / X https://x.com/algo_diver/status/1954423622787039379

Fuck so everything is basically stochastic interpolants World needs simpler introduction to schrodinger bridge and stochastic interpolants. Math rn is probably too unfriendly for normies + bonus point for simple pytorch implementation”” / X https://x.com/cloneofsimo/status/1955293818435096914

Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5: 1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5. 2. We will then prioritize API demand up to the”” / X https://x.com/sama/status/1955077002945585333

GPT-5 is the most significant product release in AI history, but not for the reason you might think. What it signals is that we’re moving from the “”bigger model, better results”” era to something much more nuanced. This is a genuine inflection point. The fact that people call a”” / X https://x.com/douwekiela/status/1955329657852834207

RT @Yuchenj_UW: The irony of AI: smarter than a PhD, dumber than an intern.”” / X https://x.com/Yuchenj_UW/status/1955119993189998718

GPT-5 takes 55% more time than Sonnet 4, but is 40% cheaper on the RooCode Leaderboard Which one are you choosing? https://x.com/scaling01/status/1955669720843358502

RT @karpathy: I’m noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic b…”” / X https://x.com/teortaxesTex/status/1954398794604253335

RT @QodoAI: Qodo Command—our CLI AI agent—just scored 71.2 on the SWE-benchmark, high enough to put us in the top 5. It achieved this sc…”” / X https://x.com/hwchase17/status/1955110032720400464

State of torch.compile, August 2025. https://x.com/ezyang/status/1955820298907082876

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call https://x.com/shai_s_shwartz/status/1955968602978320727

ChatGPT-5 Pro is the first model to successfully do this non-puzzle consistently. GPT-5 Thinking and GPT-5 fail as every other model before has (except for, occasionally, Sonnet). https://x.com/emollick/status/1953604710205690212

RT @deedydas: Ridiculous that OpenAI claimed 74.9% on SWE-Bench just to prove they were above Opus 4.1’s 74.5%… By running it on 477 probl…”” / X https://x.com/akbirkhan/status/1954231799590301953

Artificial Analysis on X: “GPT-5 occupies both the #1 and #2 positions in our long context reasoning benchmark (AA-LCR) 🤯 AA-LCR tests long context performance through testing reasoning capabilities across multiple long documents (~100k tokens). Questions typically require considering multiple documents https://t.co/BEq9ZspRMs” / X
https://x.com/ArtificialAnlys/status/1953523986526351576

Bartosz Naskręcki on X: “Ok, some general comments while I am waiting for the other tasks. GPT 5 and GPT 5 Thinking are nothing very novel for research mathematicians compared to o3-pro and o4-mini-high. But GPT 5 Pro is different. It uses much more compute and the quality of the answers is superb.” / X
https://x.com/nasqret/status/1953566692686397885

Matthew Berman on X: “Vibe coding a full Excel clone step-by-step with GPT-5 is kinda nuts. (I know I’m still far from a full clone, but this is after about 45 minutes of work) https://t.co/upr5QSmluJ” / X
https://x.com/MatthewBerman/status/1954694677736956297

GPT-5 has been hovering around a 7% diff edit failure rate since its release to Cline last Thursday. How have you liked GPT-5 so far in Cline? https://x.com/cline/status/1955357460627329151

GPT-5 is live in Cline. We’ve been working with OpenAI to get this model ready, and here’s our take: it’s disciplined, persistent, & highly competent. It’s collaborative in planning & and a diligent operator while acting. It plans thoroughly, asks optioned follow-ups when https://x.com/cline/status/1953525433808695319

GPT-5 is speed-running Pokemon It’s 3x faster than o3 https://x.com/scaling01/status/1955813023735828587

gpt-5 is the best coding model in the world and is now the default in @cursor_ai. https://x.com/gdb/status/1953521501548032512

GPT-5 Just Finished Pokemon Red! : r/singularity https://www.reddit.com/r/singularity/comments/1mq2irv/gpt5_just_finished_pokemon_red/

GPT-5 just finished Pokémon Red! 6,470 steps vs. 18,184 for o3! Check the stats site to compare! That’s a huge improvement! Well done, @OpenAI you cooked with GPT-5. What an incredible model. Next up: GPT-5 vs. Pokémon Crystal (16 Badges + Red). The run starts soon on Twitch. https://x.com/Clad3815/status/1955980772575268897

GPT-5 now rolled out to 20% of paid users and doing >2B TPM on the API! so far so good… excellent work by the eng and infra teams!”” / X https://x.com/sama/status/1953563605733118317

gpt-5 is SOTA on FrontierMath:”” / X https://x.com/gdb/status/1953710811957858404

Not all data is created equal. Scaling quality control for data that can challenge PhDs and the most advanced LLMs demands a different approach. To meet this demand, we built something new: autoraters powered by multi-agent model debate. Here’s how it works 🧵 https://x.com/scale_AI/status/1955405890288570617

RT @Zai_org: Presenting the GLM-4.5 technical report!👇 https://x.com/_lewtun/status/1955242926596035023

I’m noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to”” / X https://x.com/karpathy/status/1954224651443544436

RT @scaling01: Anthropic is the only company where LLMs get more expensive over time https://x.com/scaling01/status/1955313676665151704

“Write a single paragraph about someone who doles out their remaining words like wartime rations, having been told they only have ten thousand left in their lifetime. They’re at 47 words remaining, holding their newborn.” https://x.com/emollick/status/1954788724086063446

JAX = performance & scalability Keras 3 = high velocity development, compact code, best practices by default Both at the same time = pretty killer”” / X https://x.com/fchollet/status/1954686735646068772

Rather than vague posting about upcoming releases, you can get the same effect from very specific posting. “Very excited about the new release, and I think you can see why from these weights: [0.0178, -0.0921]””” / X https://x.com/emollick/status/1953353407982342541

RT @SemiAnalysis_: At the end of the day, the SWE-bench leaderboard on swebench dot com is probably the most clear description of current m…”” / X https://x.com/jeremyphoward/status/1955070796256383137

AIBrix LLM inference infrastructure evaluations focusing on H20s. Qwen, DeepSeek they boast of support for large-scale EP, but do these numbers look good? interesting details on KVCache offloading framework and other cache operations https://x.com/teortaxesTex/status/1954464993333698758

After thinking about this problem for months, I am so happy to finally introduce DetailBench! It answers a simple question: How good are current LLMs at finding small errors, when they are *not* explicitly asked to do so? (Yes, the graph is right!) https://x.com/xeophon_/status/1956025495515979984

The current state of AI for sustained work: exponential progress continues with no unexpected leaps but also no walls. (Yes, this METR measure is just one of many benchmarks, and like all benchmarks has flaws, but also has the advantage of have neither a ceiling or floor effect) https://x.com/emollick/status/1954180531785994670

We tested how autonomous AI agents perform on real software tasks from our recent developer productivity RCT. We found a gap between algorithmic scoring and real-world usability that may help explain why AI benchmarks feel disconnected from reality. https://x.com/METR_Evals/status/1955747420324946037

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark https://x.com/NousResearch/status/1956090990005248341

To get a sense of GPT-5’s vibes, I exported my Tweet data over the last year and got it to write like my top posts Then took my newsletter and made it create 3 separate long-form tweets It’s not 100% there, but it beats Claude, which was previously my go-to for editing https://x.com/rowancheung/status/1953505497237029346

The enhance meme from Bladerunner, except the AI is asking the computer to enhance.”” / X https://x.com/emollick/status/1954534598903275605

@jxmnop @johnschulman2 @srush_nlp Super cool stuff!! How can we empirically check how far away this model is from the real base model? What benchmarks do we expect this base model to do better on and what benchmarks do we expect it to do worse on [when compared to the unmodified gpt-oss model]?”” / X https://x.com/OfirPress/status/1955463664556769426

This is fraud, no 2 ways about it. >10% performance degradation is a joke. This is the equivalent of a store emptying out 10% from a can and selling you the remaining 90% at full price.”” / X https://x.com/nrehiew_/status/1955613510463037611

June mental health emergency room data just came out – still flat despite the ubiquity of LLM use, which might be informative as a limit on what the underlying phenomenon could be https://x.com/literalbanana/status/1948026993787977822

BI confirmed my intel. We’re gradually symmetrizing information in the private markets. 2nd order effects in capital allocation will be sharper pricing, higher capital velocity, and the collapse of marginal arbitrage — shifting leverage to founders. Good. https://x.com/arfurrock/status/1955740969116299466

wtf more than half its parameters are embeddings. https://x.com/code_star/status/1956033343465906379

Call me crazy but I think the newer openai models inc. 5 (and gemini, other maxxing models) are nearly unusable for most of my use cases. Far too verbose, extraordinarily confident in their hallucinations, difficult to steer at times but at the same time fairly sycophantic,”” / X https://x.com/jon_durbin/status/1954263916202316001

A conversation with @demishassabis on world models (genie 3), deep think, the need for better evals (game arena), and our progress towards AGI. https://x.com/OfficialLoganK/status/1954951468878090552

As AI models get commoditized, the value will be added in that final layer of orchestration. Not just routing to just one “”best”” model, but coordinating multiple models to combine strengths and create Chain of Debate.”” / X https://x.com/mustafasuleyman/status/1954956981330120832

GPT-5 with high reasoning effort on SimpleBench https://x.com/scaling01/status/1953771276549358041

gpt-5: our smartest, fastest, and most useful model to date. it’s also incredible at coding. rolling out to everyone (excitingly including free ChatGPT users!) today.”” / X https://x.com/gdb/status/1953509854603358597

I had access to GPT-5. I think it is a very big deal as it is very smart & just does stuff for you Full write up in comments, but this is “make a procedural brutalist building creator where i can drag and edit buildings in cool ways”” & “”make it better”” a bunch. I touched no code https://x.com/emollick/status/1953502029126549597

I saw a lot of people complaining about 32k context size in ChatGPT for plus users, which would be terrible for coding. But actually we are giving 196k context size for plus users when using GPT5 thinking and that’s the model you should use for coding use-cases! 32k is for the”” / X https://x.com/yanndubs/status/1955194413283737716

I suspect this is right. And I wouldn’t be surprised if the vast majority of the 700M users of ChatGPT already greatly prefer GPT-5 & that the opinion on X is not reflective of the typical experience. (Which doesn’t mean that the issues identified here aren’t very real)”” / X https://x.com/emollick/status/1954442950491902393

I used GPT-5 to leverage trade memes. From open to taking profit to setting stop losses, I followed every choice it made. Of course, I used @wasabi_protocol and decided to do this all with $troll on 3x leverage, which was freshly listed and giga sending. https://x.com/ChrisCoffeeEth/status/1954282100389281866

I’ve been using gpt-5 for a bit now. This model broke me. It is so good. I didn’t know what the price was. I assumed it would be o3-pro priced because it is that smart. Nope. Truly insane. Videos coming very soon. https://x.com/theo/status/1953507203979391011

If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly”” / X https://x.com/sama/status/1954703747495649670

Important GPT-5 PSA; if you want an answer that is maximally correct, do tell the model to think hard in your prompt. It literally will do so clearly we failed to communicate this well, apologies for that”” / X https://x.com/ericmitchellai/status/1954418339536683078

LLM meets analog. Turns out LLMs are a great brainstorming partner for synth patches. This was co-created with gpt-5 🎵 https://x.com/martin_casado/status/1953868101596192850

Let’s take a look into GPT-5’s record-setting performance on FrontierMath. How did it perform on the holdout vs. non-holdout set, how did it do across tiers, and what new Tier 4 problems did it solve? 🧵 https://x.com/EpochAIResearch/status/1955667249252978741

initial gpt-oss download stats looking exciting!”” / X https://x.com/gdb/status/1954992508964155587

i thought the transformers gpt-oss MoE finetuning was broken, how did you get it working?”” https://x.com/jxmnop/status/1955347764130254863

OpenAI’s o3 Crushes Grok 4 In Final, Wins Kaggle’s AI Chess Exhibition Tournament – Chess.com https://www.chess.com/news/view/kaggle-game-arena-chess-2025-day-3

Breaking: GPT-5 ranked 🥇 on Humanity’s Last Exam and 🥈 on MultiChallenge SEAL Leaderboards. https://x.com/scale_AI/status/1953591873031090505

GPT-5-Pro is finally rivalling Elon Musk’s 148 IQ https://x.com/scaling01/status/1955344356547653773

Overall, the general vibes of GPT-5 feel much more human-like It’s hard to measure *vibes*, but the combination of speed, lower hallucination rate, and intelligence is very noticeable As a power user, I’ve always enjoyed o3, but the speed makes it impossible for daily queries”” / X https://x.com/rowancheung/status/1953505371487600877

Our team spent some time benchmarking the GPT-5 models on one-shot document understanding capabilities. Sharing some WIP results 💡: 1️⃣ GPT-5 mini does a good job. From initial testing it edges out Sonnet and Gemini models. 2️⃣ Surprisingly GPT-5 is middle of the pack (and also https://x.com/jerryjliu0/status/1954293351702036712

unicorn-drawing eval from gpt-3.5 to gpt-5:”” / X https://x.com/gdb/status/1953528523437600892

This is exactly what @javirandor did with chatgpt last year to get it to spit out memorized training data: https://t.co/dtFpAAw6ps There’s probably some interesting stuff to study on such “”re-based”” models”” / X https://x.com/florian_tramer/status/1955510942252572946

@SebastienBubeck Heads up, I’m fairly certain that the o3 run being compared to did not have the google search tool – which is important, since Bulbapedia gives the solution in one tool-use call to some puzzles that can take a much longer time if solved without solution info”” / X https://x.com/kiranvodrahalli/status/1956044490885751273

“Narrate a regency romance like a machine learning model being trained. Each step of the story must be described as data inputs, probability weights & optimization functions. The romance must emerge through correlation coefficients & unexpected variables in the training data” https://x.com/emollick/status/1955217534812082574

We’ve scored highly enough to achieve gold at this year’s IOI online competition with a reasoning system — placing #6 when ranked with humans and #1 when ranked with other AIs. In just a few weeks: • 2nd at AtCoder • Gold medal-level at IMO • Gold medal-level at IOI”” / X https://x.com/OpenAI/status/1954969035713687975

@finbarrtimbers To improve the router, you need some signal on when a routing decision is a good one. Not sure how helpful implicit feedback (or explicit thumbs-ups) will be.”” / X https://x.com/jefrankle/status/1955634983021998252

4chan continues to launch proton torpedoes into the riddle-shaped thermal exhaust port of our much maligned Death Star I don’t think it’s just a non-reasoning problem, *boys* I had hoped we’d have figured this out before IMO Golds https://x.com/teortaxesTex/status/1954741943952666629

Gepa optimizer, the prompt optimization technique that beats rl, is out in dspy 3.0.0!”” / X https://x.com/MaximeRivest/status/1955431980868542692

Getting my head around DSPy’s advanced optimizers was a little tricky for me. There are many steps and parameters controlling the process. So I dug deep into MIPROv2 and wrote a guide how to configure it in detail! While I was at it I built a website to host notes like https://x.com/heylegacyguy/status/1955682283270078484

Great work!! We would love to see the patches upstreamed so the recipe will work for future version of vLLM as well. 🤝”” / X https://x.com/vllm_project/status/1955137499166081464

I bet the router’s going to get very good very fast. The amount + quality of data you can get from serving it in prod is a step change from what you get in dev.”” / X https://x.com/finbarrtimbers/status/1955633697635631112

I really don’t want a conversational AI that *interrupts me* It should not sound *natural*, it should sound what it is, *artificial*. I want clarity, pleasantness, not confusion and depressing lies.”” / X https://x.com/francoisfleuret/status/1955004348397916614

pip users are like that tribe on a remote island no one is allowed to contact we will be allowed to live unmolested as one with nature”” / X https://x.com/vikhyatk/status/1955355576055263690

So, what are the open pretraining datasets that people use nowadays? Suppose 12-15T tokens needed. – Fineweb edu 1.5T – dclm 750B – zyda 2 5T – Dolma (3T but some subsets are probably more useful)”” / X https://x.com/nrehiew_/status/1955109618528456954

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models “”Our work here reveals a critical phenomenon, temporal oscillation, where correct answers often emerge in the middle process, but are overwritten in later denoising steps. To address this issue, we https://x.com/iScienceLuvr/status/1955964748341919862

when i was a young padawan i used to resent the older engineers in the org because they wouldn’t let me add the latest new framework to the codebase now im old and tired. the young fucks want to use uv but i already have the pip commands memorized”” / X https://x.com/vikhyatk/status/1954507093488349597

OverFill: Two-Stage Models for Efficient Language Model Decoding “”OverFill begins with a full model for prefill, processing system and user inputs in parallel. It then switches to a dense pruned model, while generating tokens sequentially. Leveraging more compute during https://x.com/iScienceLuvr/status/1955965909409120476

realizing DRY was fake and unnecessary didn’t affect me too much but learning that it’s fine to store money as floats and nothing bad really happens broke me”” / X https://x.com/vikhyatk/status/1954725001913114694

Another paper comparing diffusion and autoregressive language models in the many-epoch scenario. It also comes with a good open review of the other one that I shared recently. (Happening here on x dot com the everything site!)”” / X https://x.com/giffmana/status/1954283272424595547

Diffusion language models (DLMs) are cool but people wonder why is it better than autoregressive language models (ARLMs)? More experiments need to be done, but preliminary results like this show diffusion models squeeze more performance out of the data In addition to its speed”” / X https://x.com/iScienceLuvr/status/1954765986214871489

RT @NiJinjie: Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to…”” / X https://x.com/arankomatsuzaki/status/1954242373145543134

Would not call this a base model just because it can output “”fineweb like data””. Still interested to see the how far you can reverse engineer training data by doing a small perturbation on the model weight”” / X https://x.com/eliebakouch/status/1955479573489213593

do not, my friends, become addicted to synthetic data. it will take hold of you, and you will resent its absence”” / X https://x.com/typedfemale/status/1954284624076767705

DSPy 3.0!! 🎉🎉🎉 A lot to love! GRPO and RL training makes its way to DSPy, alongside two new optimization algorithms, SIMBA and GEPA! 🎁 Awesome to see continued innovations in Prompt Optimizers and these new optimization algorithms that leverage the natural language”” / X https://x.com/CShorten30/status/1955445406441033906

GLM 4.5 report. I missed this. Most of the text is about a complex post-training strategy. Recommended reading. @kalomaze I haven’t seen a clarification of whether Air was also trained with Muon (blog posts were vague on it), here it seems that yes it was. https://x.com/teortaxesTex/status/1954754947892850913

I wanted to provide more color on this because its a bigger deal than the main Ai2 account is hyping it as. For example, the entire annual budget of the NSF for AI in 2026 is $655M dollars. To commit to a training models on single line item of about 20% of that is a huge deal.”” / X https://x.com/natolambert/status/1955986546626322479

RT @dorsa_rohani: New fastest shortest-path algorithm in 41 years! Tsinghua researchers broke Dijkstra’s 1984 “sorting barrier,” achieving…”” / X https://x.com/dilipkay/status/1954701721932046423

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning “”We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem during training and filtering responses to train on based on two https://x.com/iScienceLuvr/status/1955955524790575212

Traditional pre-training had diminishing returns (which is what the “scaling law”predicted anyway) The fact that reasoners were developed at exactly the moment where pre-training faltered is exactly the pattern of how Moore’s Law works: new technique appear to maintain the trend https://x.com/emollick/status/1953901107144786085

Have you ever felt you are developing cuda kernels and your tests often run into illegal memory access (IMA for short) and you have no idea how to debug? We have collaborated with the @nvidia team to investigate how cuda core dump can help, check out the blogpost to learn more!”” / X https://x.com/vllm_project/status/1955478388178817298

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models “”we replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.”” “”We show that our approach recovers a substantial portion of the https://x.com/iScienceLuvr/status/1955958029993828724

RT @AISecurityInst: How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with @AiEleuther, we…”” / X https://x.com/BlancheMinerva/status/1955228688296866285

#CodingWithGLM 🤝 @Kilo_Code It’s happening! We’re kicking off the #CodingWithGLM series, bringing GLM-4.5 to the dev tools you use every day. First up, the incredible @Kilo_Code🚀 Unlock lightning-fast code generation, smart refactoring, and instant explanations right inside https://x.com/Zai_org/status/1955627932543840510

GLM-4.5V is now available on Anycoder. Thanks AK! @_akhaliq https://x.com/Zai_org/status/1955092307843154093

The GLM-4.5 tech report is worth reading”” / X https://x.com/bigeagle_xd/status/1954763239738519618

Which OpenAI model is currently your daily driver?”” / X https://x.com/bilawalsidhu/status/1955724935705309642

🖥️🤖 LangGraph CLI Connect to LangGraph Platform directly from the terminal! Featuring comprehensive management of assistants, threads, and runs with real-time streaming capabilities. Explore the CLI on GitHub 🚀 https://x.com/LangChainAI/status/1954226169412493544