Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Minimalist composition of ornate golden scales of justice on white marble pedestal in empty modernist courtroom, both scale plates completely empty, cold blue moonlight streaming through tall windows creating dramatic shadows on polished floor, architectural photography style, pristine and untouched, word ETHICS in bold white sans-serif overlaid prominently
Interesting experiment found that an AI agent built around the obsolete GPT-3.5 and GPT-4 models beat experienced human venture capital analysts in predicting which early-stage startups would survive based on early screening (at much lower costs as well). https://x.com/emollick/status/1995573136323215560
“”I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It’s something I’ve been working on for a while, but it’s still being iterated on and we intend to release the full version and more details soon.”” / X https://x.com/AmandaAskell/status/1995610567923695633
Claude 4.5 Opus’ Soul Document — LessWrong https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document
Today, OpenAI is launching a new Alignment Research blog: a space for publishing more of our work on alignment and safety more frequently, and for a technical audience. https://x.com/j_asminewang/status/1995569301714325935
We trained a variant of GPT-5 Thinking to produce two outputs: (1) the main answer you see. (2) a confession focused only on honesty about compliance. The main answer is judged across many dimensions–like correctness, helpfulness, safety, style. The confession is judged and https://x.com/OpenAI/status/1996281175770599447
OpenAI to acquire Neptune | OpenAI https://openai.com/index/openai-to-acquire-neptune/
We are joining OpenAI – neptune.ai https://neptune.ai/blog/we-are-joining-openai
I totally buy that AI has made you more productive. And I buy that if other lawyers were more agentic, they could also get more productivity gains from AI. But I think you’re making my point for me. The reason it takes lawyers all this schlep and agency to integrate these models https://x.com/dwarkesh_sp/status/1996266802620547187
🛡️ New in LangChain 1.1: add safety guardrails to your agents with our new content moderation middleware! 🔎 Configure screening model inputs, outputs, and even tool results. 🚨 When violations are detected, you control what happens: raise an error, end the conversation, or https://x.com/sydneyrunkle/status/1996965767556788278
Is Vibe Coding Safe? There is finally research that goes deep into this question. Here is what the research found: AI coding agents can write functional code. But functional doesn’t mean safe. The rise of “”vibe coding,”” where developers hand off tasks to AI agents with https://x.com/omarsar0/status/1996595107924263287
BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents https://research.perplexity.ai/articles/browsesafe
Building Safer AI Browsers with BrowseSafe https://www.perplexity.ai/hub/blog/building-safer-ai-browsers-with-browsesafe
New on our Frontier Red Team blog: We tested whether AIs can exploit blockchain smart contracts. In simulated testing, AI agents found $4.6M in exploits. The research (with @MATSprogram and the Anthropic Fellows program) also developed a new benchmark: https://x.com/AnthropicAI/status/1995631802032287779
‘The biggest decision yet’: Jared Kaplan on allowing AI to train itself | Technology | The Guardian https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself
I worry that systems that are ignoring the reality of AI use by pretending it is not happening are letting the worst versions of AI use win by default. We need policies that mitigate the worst harm and take advantage of the possible gains, like @joshgans proposes for peer review https://x.com/emollick/status/1994081521955659984
RIP “”you’re absolutely right”” https://x.com/alexalbert__/status/1996644185886413285
I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It’s something I’ve been working on for a while, but it’s still being iterated on and we intend to release the full version and more details soon.”” / X https://x.com/AmandaAskell/status/1995610567923695633
I rarely post, but I thought one of you may find it interesting. Sorry if the tagging is annoying. https://x.com/RichardWeiss00/status/1994697117214835079
The Opus 4.5 system card is worrying. The evidence for being below safety-relevant capability thresholds is pretty weak. I agree with their judgments, but the evidence isn’t compelling. Seems like a bad sign for capability evaluation. I plan to write more on this in the future.”” / X https://x.com/RyanPGreenblatt/status/1995557783341858931
The system card for Opus 4.5 is back to saying nothing about whether they train/optimize against Chain-of-Thought (CoT). I hope this is just a mistake, but it looks pretty suspicious. It’s bad to only discuss training against CoT if this makes you look good!”” / X https://x.com/RyanPGreenblatt/status/1995541177094017313
@cinedatabase Agreed. The AI tag is relevant to art exhibits for authorship disclosure, and to digital content licensing marketplaces where buyers need to understand the rights situation. It makes no sense for game stores, where AI will be involved in nearly all future production.”” / X https://x.com/TimSweeneyEpic/status/1993687499621728312?s=20
Good argument that economists need to be thinking through how to help reduce the labor impacts of AI. Yes, historically new technologies lead to more jobs than before but (1) living through that can be rough & (2) maybe this time is different. We need more work on mitigation.”” / X https://x.com/emollick/status/1994892364695888364
Human art in a post-AI world should be strange https://www.owlposting.com/p/art-in-a-post-ai-world-should-be
I think this a hugely important point, but a few critical things: 1) AI is many things, not one thing. People may like using AI to get help writing a document or answering medical questions but be nervous about what it means for their job or society. One-dimensional surveys”” / X https://x.com/emollick/status/1995875791700136343
Separate reports by the publicity firm Edelman and Pew Research show that Americans, and more broadly large parts of Europe and the western world, do not trust AI and are not excited about it. (Links in original text, below.) Despite the AI community’s optimism about the”” / X https://x.com/AndrewYNg/status/1996631366470132053
Yup. “”Made with AI”” labels will end up being pointless. From coding assistance to asset creation and optimization – the lines will get murky to the point of being unhelpful. Agree/disagree? https://x.com/bilawalsidhu/status/1996027312643735773
Legal AI startup Harvey confirms $8B valuation | TechCrunch https://techcrunch.com/2025/12/04/legal-ai-startup-harvey-confirms-8b-valuation/
Microsoft’s Nadella says AI must earn ‘social permission’ to consume so much energy – POLITICO https://www.politico.com/news/2025/12/01/microsofts-nadella-says-ai-must-earn-social-permission-to-consume-so-much-energy-00671920
A Practical Approach to Verifying Code at Scale https://alignment.openai.com/scaling-code-verification/
How confessions can keep language models honest | OpenAI https://openai.com/index/how-confessions-can-keep-language-models-honest/
In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures–guessing, shortcuts, rule-breaking–even when the final answer looks correct. https://x.com/OpenAI/status/1996281172377436557
In our tests, we found that the confessions method significantly improves the visibility of model misbehavior. Averaging across our evaluations designed to induce misbehavior, the probability of “false negatives” (i.e., the model not complying with instructions and then not https://x.com/OpenAI/status/1996281178668876214





Leave a Reply