Alignment: AI News Week Ending 12/26/2025

Alignment: AI News Week Ending 12/26/2025

December 26, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Seamless repeating wrapping paper pattern in deep navy and antique gold featuring ornate Victorian compass roses, navigation instruments, directional arrows, and ethical symbols in elegant damask style, with ‘Alignment’ integrated as decorative cartographic typography, subtle embossed texture, sophisticated gift wrap quality

Bloom – an open-source agentic tool that auto-generates behavioral evaluations for AI models by @AnthropicAI It turns what was once painstaking alignment work into a matter of configuration. – Bloom crafts and judges hundreds of scenarios targeting specific traits like https://x.com/TheTuringPost/status/2003629256522498061

Andrej Karpathy on X: I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There’s a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.”
https://x.com/karpathy/status/2004607146781278521

Introducing Bloom: an open source tool for automated behavioral evaluations \ Anthropic https://www.anthropic.com/research/bloom

We’re hiring for the safety team at xAI We work on RL post-training, alignment/model behavior, and reducing catastrophic risk This work is incredibly high impact If you have strong ML skills and this sounds exciting to you, DM me https://x.com/StewartSlocum1/status/2005710683623809440

A lot of people underestimate AI due to the confluence of 4 OpenAI choices: 1) GPT-5.x instant is not a very smart model 2) Most users are free users & the ChatGPT router sends them to instant often 3) The router calls everything GPT-5.2 4) Most people don’t know Reasoners exist https://x.com/emollick/status/2001840267155153362

Today @OpenAI updated the Model Spec, laying out how models are ‘intended to behave.’ Not marketing. Just explicit rules, priorities, and tradeoffs. Great reading if you’re wondering why models respond the way they do. Changelog + teen protections in 🧵👇 https://x.com/shaunralston/status/2001744269128954350

You can now adjust specific characteristics in ChatGPT, like warmth, enthusiasm, and emoji use. Now available in your “”Personalization”” settings. https://x.com/OpenAI/status/2002099459883479311

We are hiring a Head of Preparedness. This is a critical role at an important time; models are improving quickly and are now capable of many great things, but they are also starting to present some real challenges. The potential impact of models on mental health was something we”” / X https://x.com/sama/status/2004939524216910323?s=20

Waymo dropped a blog that effectively confirms the “dependency trap” I tweeted about. The SF incident happened because of a “backlog” of “confirmation checks” requiring remote human operators. Humans is a module in Waymo’s stack. That module does not scale. https://x.com/Yuchenj_UW/status/2003708815934640536

I love the expression “food for thought” as a concrete, mysterious cognitive capability humans experience but LLMs have no equivalent for. Definition: “something worth thinking about or considering, like a mental meal that nourishes your mind with ideas, insights, or issues that”” / X https://x.com/karpathy/status/2001699564928279039

my only goal for 2026 is to not have the worst year of my life for the 4th year in a row”” / X https://x.com/zoeloveshouses/status/2005704976627351571

The constant Rip van Winkle astonishment of almost every AI model that GPT-5 exists remains pretty amusing (if annoying for practical purposes), as does their sheer incredulity about the state of the world in late 2025. Thinking traces full of “wait, that can’t be right””” / X https://x.com/emollick/status/2002548186511179907

It is clear talking to journal editors that there is no consensus about how to adjust peer review for the current flood of AI-written bad papers (bad papers now look like good papers, making reviewing harder), let alone the prospect of a flood of good papers developed with AI”” / X https://x.com/emollick/status/2002936845290832282

This paper asked 25 different AI models to write a metaphor about time. Nearly all said “time is a river” or “time is a weaver.” It is not completely clear why: likely overlapping training, alignment processes, and synthetic data contamination. More idea diversity would be good https://x.com/emollick/status/2002183640453685280

China issues draft rules to regulate AI with human-like interaction | Reuters https://www.reuters.com/world/asia-pacific/china-issues-drafts-rules-regulate-ai-with-human-like-interaction-2025-12-27/

Capability overhang means too many gaps today between what the models can do and what most people actually do with them. 2026 Prediction: Progress towards AGI will depend as much on helping people use AI well, in ways that directly benefit them as on progress in frontier models https://x.com/OpenAI/status/2003594025098785145

I had some questions about whether these modifications impacted accuracy of outputs, but was told by the OpenAI team that worked on this that tone does not impact that. (Also I like that we are moving away from discussing prompts to modify AI personality to prompts changing tone)”” / X https://x.com/emollick/status/2002452909657895115

A robot that sees the terrain and predicts its own future… up to 5 seconds ahead? This is real. ❗️Best Systems Paper finalist at #RSS2025 The team introduces a perceptive Forward Dynamics Model that helps legged robots safely navigate rough, complex environments: no manual https://x.com/IlirAliu_/status/2002092349615120757

Your robot moves fast… but objects slide off the tray? This system hears the sliding and learns how to stop it: Researchers at CMU developed a new method that uses sound to model real-world friction in motion planning. It enables time-optimized, high-speed transport without https://x.com/IlirAliu_/status/2003179502545854521

The definition of robotic labor is a moving target. As machines master more complex cognitive and physical tasks, the threshold for what we consider ‘robotic’ will continue to shift.”” / X https://x.com/TheHumanoidHub/status/2003840335072678187

Final Sarah Paine lecture: Why Russia lost the Cold War. To me, the most interesting question is not why the Soviet Union ultimately collapsed – it’s how a brutal, centrally planned, stupendously inefficient, colonial land empire survived for so long. I was surprised to learn https://x.com/dwarkesh_sp/status/2002075498101551554