Ethan B. Holland

Over 54,900 manually organized AI links and counting

Alignment: AI News Week Ending 12/19/2025

December 19, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Photorealistic 35mm cinema shot of a child aged 6-8 in cozy bedroom holding an upside-down instruction manual, surrounded by panoramic arc of TV screens displaying conflicting arrows and maze diagrams, warm domestic lighting with cool blue screen glow, scattered books with lock-and-key diagrams, partially assembled mismatched toy robot on plush rug, spinning compass nearby, shallow depth of field, soft focus, warm pastels and muted lavender tones, large bold text reading ALIGNMENT at top of frame, tender yet subtly disquieting atmosphere.

A Message from AI Research Leaders: Join Us in Supporting OpenReview https://x.com/openreviewnet/status/2001835887244501221

it’s true, i can code nyt didn’t fact check that one 🤷‍♂️”” / X https://x.com/alexandr_wang/status/2001217783497945140

OpenAI Rolls Back ChatGPT’s Model Router System for Most Users | WIRED https://www.wired.com/story/openai-router-relaunch-gpt-5-sam-altman/

Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from @GoogleDeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator https://x.com/Majumdar_Ani/status/1999525259276423569

⚖️ Pairwise Annotations: Scores are hard, preferences are easy. Agents handle tasks that are tough to score but easy to compare: support responses where tone matters, code refactors where both work but one feels cleaner, product specs where “”good”” is subjective. In practice, https://x.com/LangChain/status/2001361753851203724

Replit — Inside Replit’s Snapshot Engine: The Tech Making AI Agents Safe https://blog.replit.com/inside-replits-snapshot-engine

When Agents Attack: How AI Collapses and Rebuilds Marketplace Moats https://www.caseyaccidental.com/p/when-agents-attack-how-ai-collapses

I love the expression “food for thought” as a concrete, mysterious cognitive capability humans experience but LLMs have no equivalent for. Definition: “something worth thinking about or considering, like a mental meal that nourishes your mind with ideas, insights, or issues that”” / X https://x.com/karpathy/status/2001699564928279039

If the last month tells us anything about AI… it is that nobody has figured out a good naming scheme for AI models that lets non-experts understand which one to pick & how big an improvement it might represent.”” / X https://x.com/emollick/status/1999212790418915431

A thing that the other models need to copy from Claude is a switch that lets you turn off web search. Now that all the models are good at using tools, they turn to the web too often when sometimes you just want the model to take what you put in the context window & work with that https://x.com/emollick/status/2000807086880694752

Claude Skills can accomplish a lot of hard tasks & are accessible to non-technical people, but hidden behind a somewhat intimidating technical gloss. With some better user experience, they are a natural sequel to GPTs as a way for people inside organizations to innovate with AI.”” / X https://x.com/emollick/status/1999148820668555520

First Look: Unboxing Guardrails for AI-Generated Code https://webinars.sonatype.com/wcc/eh/5011667/lp/5151488/first-look-unboxing-guardrails-for-ai-generated-code/

harnesses are distribution mechanisms for good tooling and taste each choice helps craft the ✨experience✨ for the user planning view, context management on behalf of user, specialized subagents we think are useful, UX flow for viewing subagents, memory updates UX, parallel”” / X https://x.com/Vtrivedy10/status/2001492640076894661

Interpretability agents are a big deal for researchers. But they’re a pain – research is so custom! Seer has many quality of life improvements to make research with agents easy. It’s hackable & extensible, to enable as much research as possible, incl weird cursed techniques!”” / X https://x.com/NeelNanda5/status/2002051650949943346

Official rule for all AI labs: no more demoing your product with either telling the AI to “book a trip for me” or creating AI photos/videos of your company’s CEO in crazy situations. Sorry, those are the rules now. https://x.com/emollick/status/2001119366557900914

We’ve received some feedback about a potential degradation of Opus 4.5 specifically in Claude Code. We’re taking this seriously: we’re going through every line of code changed and monitoring closely. In the meantime please submit any transcripts with issues through /feedback”” / X https://x.com/trq212/status/2001541565685301248

Inference Economics 101: Reserved Compute versus Inference APIs https://www.datagravity.dev/p/inference-economics-101-reserved

I have never been more certain that if AI development stopped today, we would still have massive & rolling disruption across society & the economy for the next ten years as people figured out how to harness what models can already do. And the end of AI progress seems unlikely.”” / X https://x.com/emollick/status/1999242260945178813

.@AIatMeta clarified a concept we strongly support – human and AI co-improvement. When building AI systems that work with human researchers at every step – from ideas to experiments – we can create safer intelligence and tech. Here is how to train AI specifically for research https://x.com/TheTuringPost/status/1999294766664831253

No verifiers? No problem. 🤝 The Together Research team is excited to introduce RARO — a new paradigm that unlocks scalable reasoning. By teaching LLMs to reason through adversarial games, we’re seeing promising results where standard RL fails. Check it out now and let us know”” / X https://x.com/togethercompute/status/2000631170909057390

Overall I’m very excited to see this! I’ve been wanting more transparency into how models are improving at science – we expect models to see the same breakthroughs for science in the next year or so as they have shown in coding to date. Big things are coming.”” / X https://x.com/jungofthewon/status/2001302387949236510

Please join me, Doina Precup @kchonyc @AndrewYNg @Yoshua_Bengio @rshaveddinov @earnmyturns in providing financial support for Open Review. It is one of the most important open platforms for quality AI research. We must ensure that it is well funded and can fulfill its mission.”” / X https://x.com/jpineau1/status/2001843615598092414

What Actually Is Claude Code’s Plan Mode? | Armin Ronacher’s Thoughts and Writings https://lucumr.pocoo.org/2025/12/17/what-is-plan-mode/

The “”compacting conversation”” thing that Claude does as a chatbot doesn’t work as well as it does for coding. It doesn’t seem built for knowledge work, abruptly resetting everything in terms of tone and flow. Rolling context windows (like ChatGPT) might be better, or an option.”” / X https://x.com/emollick/status/2000411848496291897

An engineer showed Gemini what another AI said about its code Gemini responded (in its “”private”” thoughts) with petty trash-talking, jealousy, and a full-on revenge plan 🧵 https://x.com/AISafetyMemes/status/2000620127054598508

Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior – Google DeepMind https://deepmind.google/blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior/

Introducing Gemma Scope 2 🤗Largest open release of interpretability tools (over 1 trillion parameters trained!) 🔬Works as a microscope to analyze all Gemma 3 models’ internal activations 🗣️Advanced tools for analyzing chat behaviors https://x.com/osanseviero/status/2001989567998836818

FunctionGemma has day-0 support on MLX 🔥🚀 A tiny but mighty single-turn function calling model. Great for on-device tool use, MCP, RAG, routing and more. Get started today: > pip install -U mlx-lm Or run it on your iPhone using MLX-Swift. Notebook example: https://x.com/Prince_Canuma/status/2001713991115026738

In my opinion, the model itself is excellent, but the ChatGPT user experience can sometimes limit what it is capable of. In particular, when working with very long texts by uploading them as a .txt file (or similar), ChatGPT may not fully read the entire context and instead rely”” / X https://x.com/Hangsiin/status/2002020993129431181

What the monitor gets to read and the capability of the monitor matters. Stronger monitors that can read CoTs and use more test-time compute get much better fast. Also, post-hoc follow-ups (by asking the model to elaborate) often surface previously unspoken thoughts and boost”” / X https://x.com/OpenAI/status/2001791136223105188