Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic wide shot of a futuristic orbital ring station in deep space, two symmetrical command centers facing each other across a dark void with a glowing precision gyroscope floating between them, cool blue and muted green lighting with dramatic rim light, sleek military sci-fi aesthetic, epic scale showing the tension between human control and AI systems, Ender’s Game inspired strategic environment
We’re open-sourcing an evaluation used to test Claude for political bias. In the post below, we describe the ideal behavior we want Claude to have in political discussions, and test a selection of AI models for even-handedness: https://x.com/AnthropicAI/status/1989076472208978127
This is a hard area to get right, but we’ve been pretty consistent in trying to make Claude approach political topics fairly. I actually think a lot of existing norms around respect and professionalism can inform how AI models should navigate these issues.”” / X https://x.com/AmandaAskell/status/1989328363077382407
Measuring political bias in Claude \ Anthropic https://www.anthropic.com/news/political-even-handedness
Project Fetch: Can Claude train a robot dog? \ Anthropic https://www.anthropic.com/research/project-fetch-robot-dog
New Anthropic research: Project Fetch. We asked two teams of Anthropic researchers to program a robot dog. Neither team had any robotics expertise–but we let only one team use Claude. How did they do? https://x.com/AnthropicAI/status/1988706380480385470
I don’t want to live in a world where AI transcends humanity. I don’t think anyone does.”” / X https://x.com/mustafasuleyman/status/1986502379160965164
We can’t build superintelligence just for superintelligence’s sake. It’s got to be for humanity’s sake, for a future we actually want to live in. It’s not going to be a better world if we lose control of it.”” / X https://x.com/mustafasuleyman/status/1987927163061035364
What kind of AI does the world really want? At @MicrosoftAI, we’re working towards Humanist Superintelligence (HSI): incredibly advanced AI capabilities that always work for, in service of, people and humanity. And to do this we have formed the MAI Superintelligence Team. https://x.com/mustafasuleyman/status/1986433769046483430
It shouldn’t be controversial to say AI should always remain in human control – that we humans should remain at the top of the food chain. That means we need to start getting serious about guardrails, now, before superintelligence is too advanced for us to impose them.”” / X https://x.com/mustafasuleyman/status/1986834581576941763
Two things can be true. If you’re not amazed by AI, you don’t really understand it. If you’re not afraid of AI, you don’t really understand it.”” / X https://x.com/mustafasuleyman/status/1988111942490415180
🟡 NEW: Microsoft is joining the race for superintelligence, but with a caveat: It will prioritize human control over the technology at the expense of maximum capability. https://x.com/semafor/status/1986433905520525789
Let’s just say the new @MicrosoftAI Superintelligence Team is pretty pumped… Even more GPUs go brrrr”” / X https://x.com/mustafasuleyman/status/1988655676869214687
GPT-5.1 is a great new model that we think people are going to like more than 5. But with 800M+ people using ChatGPT, one default personality won’t work for everyone. We launched new preset personalities so people can make ChatGPT their own. https://x.com/fidjissimo/status/1988683216681889887
Moving beyond one-size-fits-all – Fidji Simo https://fidjisimo.substack.com/p/moving-beyond-one-size-fits-all
We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach”” / X https://x.com/OpenAI/status/1989036214549414223
Advanced AI models shift their “beliefs” as they encounter new information & have interactions with people. Active persuasion works but effects come from overall context. Aside from alignment issues, shows why “SEO for agents” is not simple, as AI behavior can vary with context. https://x.com/emollick/status/1986264795973144742
Most diffusion policies can imitate behavior… but they can’t guarantee safe behavior. That’s a problem when robots share space with people. Researchers from TUM and Stanford propose Path-Consistent Safety Filtering (PACS), a method that gives formal safety guarantees to https://x.com/IlirAliu_/status/1988550922444693898
Robots are great at following instructions. But what happens when those instructions fail? Most Vision-Language-Action models freeze or repeat the same mistake. A new approach called FailSafe shows how robots can detect and fix their own failures. The method uses a companion https://x.com/IlirAliu_/status/1986353266322538634
I do think people often err on the side of trying to make their prompts too succinct, even if the idea they’re trying to move from their own brain into the model’s brain is very complex. I have some >100 page prompts that I use pretty regularly.”” / X https://x.com/AmandaAskell/status/1986571451902927017
Interesting to see work I’ve done (suicidal ideation detection) and advocated for (anthropomorphism blockers) in NY law. It’s cool to both code up solutions and articulate ethical principles in development🐇🌈.”” / X https://x.com/mmitchell_ai/status/1988358221418106959
One question I have about GPT 5.1 is whether “”personality”” biases the model outputs. There is reason to suspect that custom instructions might impact the quality and type of answers that you receive. I would love more information about whether that is still true for personality. https://x.com/emollick/status/1988708086815821879
Weave helps you systematically catch LLM hallucinations. It logs all inputs, outputs, and scores in one dashboard. You can then run custom “”fact-check”” scorers and visualize exactly where your model is confidently wrong, turning random spot-checks into a reliable process. https://x.com/weave_wb/status/1987946840550240294
New Video: What matters right now in mechanistic interpretability? A lot has changed in AI and interp! The priorities have moved on, frontier models are WAY more interesting now I discuss the new big picture, my vision for the field, common mistakes and promising directions https://x.com/NeelNanda5/status/1989297683140354267
Building our own inference platform was by no means easy, but would have taken significantly longer if not for @modal”” / X https://x.com/ArmenAgha/status/1988763002674508160
New York Governor Kathy Hochul’s letter to all companies operating Al companions in New York. https://x.com/AndrewCurran_/status/1988290235382591904
Fei-Fei Li dropping bars on why spatial intelligence matters: “”LLMs have begun to transform how we access and work with abstract knowledge. Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.”””” / X https://x.com/bilawalsidhu/status/1987903363095343437
I’ve got you, Ron — that’s totally normal, especially with everything you’ve got going on lately.”” Who actually wants their model to write like this? Surprised OpenAI highlighted this in the GPT-5.1 announcement. Very annoying IMO. https://x.com/tamaybes/status/1988715705722892371
@testingcatalog actually–it’s better at not using em dashes–if you instruct it via custom instructions”” / X https://x.com/OpenAI/status/1988751800808435802
Unlike with normal models, we often find that we can pull out simple, understandable parts of our sparse models that perform specific tasks, such as ending strings correctly in code or tracking variable types. We also show promising early signs that our method could potentially”” / X https://x.com/OpenAI/status/1989036218160673103
Can LMs learn to faithfully describe their internal features and mechanisms? In our new paper led by Research Fellow @belindazli, we find that they can–and that models explain themselves better than other models do. https://x.com/TransluceAI/status/1989395421236793374





Leave a Reply