Alignment: AI News Week Ending 09/05/2025

Alignment: AI News Week Ending 09/05/2025

September 5, 2025

Image created with Flux Pro v1.1 Ultra. Image prompt: Alignment, two precisely parallel rows of small bananas on a clean surface, ruler-edge precision, photorealistic, editorial, minimal, high detail, 3:2 landscape

New Scale research: Can smaller models reliably oversee stronger LLM agents? We red team monitoring systems to detect covert sabotage, like agents secretly downloading sensitive information. https://x.com/scale_AI/status/1961233659228557530

AI personality isn’t the problem. The illusion of AI personhood is. https://x.com/mustafasuleyman/status/1963281258844438733

Building more helpful ChatGPT experiences for everyone | OpenAI https://openai.com/index/building-more-helpful-chatgpt-experiences-for-everyone/

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4 https://x.com/GordonWetzstein/status/1963583050744250879

Cool research from Microsoft! They release rStar2-Agent, a 14B math reasoning models trained with agentic RL. It reaches frontier-level math reasoning in just 510 RL training steps. Here are my notes: https://x.com/omarsar0/status/1964045125115662847

rStar2-Agent: Agentic Reasoning Technical Report “”We introduce rStar2-Agent, a 14B math reasoning model trained with agentic reinforcement learning to achieve frontier-level performance.”” “”three key innovations that makes agentic RL effective at scale: (i) an efficient RL https://x.com/iScienceLuvr/status/1962798181059817480

Building AI agents for production comes with unique challenges. In our latest blog, we share how we designed LangGraph to tackle them: 🔹 Why heavy abstractions fail and what really matters for control & durability 🔹 The 6 features every production agent needs in practice 🔹”” / X https://x.com/LangChainAI/status/1963646974315606428

https://x.com/omarsar0/status/1962875111037358540

Today we’re launching Atla — the improvement engine for AI agents. Atla helps agent builders find and fix recurring failures. Instead of just surfacing traces, Atla automatically identifies your agent’s most critical failure patterns and suggests targeted fixes. https://x.com/Atla_AI/status/1963586200305836264

This is a pretty important point, we have relied on all LLMs being broadly similar to each other (even to the extent that prompting is compatible across models). That may start to change with reinforcement learning.”” / X https://x.com/emollick/status/1961105788724027770

We raised a $150M Series D! Thank you to all of our customers who trust us to power their inference. We’re grateful to work with incredible companies like @Get_Writer, @zeddotdev, @clay_gtm, @trymirage, @AbridgeHQ, @EvidenceOpen, @MeetGamma, @Sourcegraph, and @usebland. This https://x.com/basetenco/status/1963981711647379653

I’m learning the true Hanlon’s razor is: never attribute to malice or incompetence that which is best explained by someone being a bit overstretched but intending to get around to it as soon as they possibly can.”” / X https://x.com/AmandaAskell/status/1961577559344455769

90% success rate in unseen environments. No new data, no fine-tuning. Autonomously. Most robots need retraining to work in new places. What if they didn’t? Robot Utility Models (RUMs) learn once and work anywhere… zero-shot. A team from NYU and Hello Robot built a set of https://x.com/IlirAliu_/status/1961692920836215229

A robot that sees the terrain and predicts its own future… up to 5 seconds ahead? This is real. ❗️Best Systems Paper finalist at #RSS2025 The team introduces a perceptive Forward Dynamics Model that helps legged robots safely navigate rough, complex environments: no manual https://x.com/IlirAliu_/status/1962569938805141861

A really useful prompt for writing: “”review this for accuracy, look up any facts you may want to challenge or explore.”” Even if not perfect, it is a good sanity check. Works well with Claude 4.1, GPT-5 Thinking, and Grok 4. Weirdly, Gemini 2.5 Pro often won’t do web searches. https://x.com/emollick/status/1961257429846691881

We really have not made a lot of progress on explaining the deep mystery of LLMs: How does a model using matrix multiplication to predict the next word manage to simulate human thought well enough to do all the very human-like things it does? And what does that mean about us?”” / X https://x.com/emollick/status/1960919256452796440

@vikhyatk If you opt out, the retention period is 30 days (no change to the existing period). https://x.com/sammcallister/status/1961520548510400753

Meta introduces Set Block Decoding (SBD), a new inference accelerator for LLMs SBD samples multiple future tokens in parallel, cuts forward passes by 3–5x, needs no arch changes, stays KV-cache compatible, and matches NTP training performance. https://x.com/arankomatsuzaki/status/1963817987506643350

@jeremyphoward Fixing this is very high on the priority list for the next version! The reason it says it in the system prompt is because the model was asking too many clarification questions (and thinking long for each) which IMO was even worse UX.”” / X https://x.com/yanndubs/status/1961716590568706226

Goated FAIR team just found how coding agents sometimes “”cheat”” on SWE-Bench Verified. It’s really simple. For example, Qwen3 literally greps all commit logs for the issue number of the issue it needs to fix. lol, clever model. “”cheat”” cuz it’s more like env hacking. https://x.com/giffmana/status/1963327672827687316

You don’t need more robot data. You need to look inside the data you already have. [📍 bookmark for later] Instead of just copying demos, STRAP pulls semantically meaningful pieces from large offline datasets to improve robustness and performance… no fine-tuning needed. Why https://x.com/IlirAliu_/status/1961850172058525813

receipts for no-BS evals talks: https://x.com/swyx/status/1963727193974153602