Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, preserve every detail of the warm marigold orange backdrop, the seated young woman with closed eyes and faint smile in her purple-and-white windbreaker, and the tattooed singer in the red beanie and layered red vest leaning into her, but replace only the black handheld microphone with a polished wooden judge’s gavel held to his mouth in the exact same grip and position, rendered with photorealistic lighting and seamless integration into the original studio portrait. After generating the image, overlay the text “Law” in the upper-left corner of the frame in large, bold, all-caps ITC Avant Garde Gothic Pro Medium (or a near-identical geometric sans-serif if unavailable), pure white (#FFFFFF), with no date, subtitle, drop shadow, or outline. The text should be substantial in scale — taking up a meaningful portion of the upper-left area — with comfortable margin from the top and left edges, set against the negative space of the orange backdrop so it does not overlap or obscure the singer, the seated woman, or the replaced object.
Harvey Agents | Delegate the Work. Own the Judgment.
https://www.harvey.ai/agents
Anthropic launched Claude Opus 4.7 today, the new #1 in our GDPval-AA benchmark for performance on agentic real-world work tasks Opus 4.7 scored 1753 on GDPval-AA at launch with its ‘max’ effort setting, surpassing GPT-5.4 xhigh. This is a significant upgrade, placing Opus back
https://x.com/ArtificialAnlys/status/2044856740970402115
Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But “”reasoning about documents”” ≠ “”parsing documents for agents.”” We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3%
https://x.com/llama_index/status/2044886527352647859
Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%). It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2.
https://x.com/ValsAI/status/2044792518953533777
big jump in coding capabilities by Claude 4.7 Opus SWE-Bench Pro 64.3% SWE-Bench Verified 87.6% TerminalBench 69.4% but interestingly, I think they kept CyberGym scores artificially low
https://x.com/scaling01/status/2044784563201708379
Claude 4.7 Opus has an Elo of 1753 on GDPVal-AA
https://x.com/scaling01/status/2044784781368365233
Claude Opus 4.7 is out! Benchmark scores look pretty strong, but clearly much worse than Mythos. It’s a nerfed Mythos, they deliberately reduced cyber capabilities during training.
https://x.com/Yuchenj_UW/status/2044787564440334350
Document Arena update: four new models are reshaping the top ranks – including two open models! – #1 Claude Opus 4.6 Thinking is new, keeping @AnthropicAI in the top 3 – #8 Kimi-K2.5 Thinking by @Kimi_Moonshot now the best open model (Modified MIT) – #10 Gemma-4-31b by
https://x.com/arena/status/2044437193205395458
Document reasoning increased by A LOT for Opus 4.7
https://x.com/scaling01/status/2044784878965703100
Introducing Claude Opus 4.7 \ Anthropic
https://www.anthropic.com/news/claude-opus-4-7
New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Claude Opus 4.6 could accelerate research on a key alignment problem: using a weak AI model to supervise the training of a stronger one.
https://x.com/AnthropicAI/status/2044138481790648323
Nontheless Opus 4.7 scores much higher on Firefox shell exploitation
https://x.com/scaling01/status/2044788243435069764
OpenAI just dropped a major Codex update, one hour after Anthropic’s Opus 4.7. Whats new: background computer use on macOS (Codex clicks and types on your Mac while you keep working), in-app browser, image generation via gpt-image-1.5, persistent memory, long-running
https://x.com/kimmonismus/status/2044832303075995994
Opus 4.7 first-hour impressions Ran the canvas tree growth test twice. 4.6: nailed the animation both times 4.7: static tree, no growth animation — twice 4.7’s thinking is noticeably shorter and faster though (trimmed some 4.6 thinking in the clip for pacing). Not the upgrade
https://x.com/stevibe/status/2044800069661254064
Opus 4.7 scores 92% on ARC-AGI-1 and 75.83% on ARC-AGI-2
https://x.com/scaling01/status/2044791039605506344
The new Opus 4.7 model places #1 on our Vibe Code Benchmark, at 71%. When we first released the benchmark 4.5 months ago, no model scored above 25%. This benchmark tests a model’s ability to create a fully functional web application from the ground up.
https://x.com/ValsAI/status/2044791415524471099
We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench – our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results 🧑🔬: – Opus 4.7 is a general improvement
https://x.com/jerryjliu0/status/2044902620746363016
What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.
https://x.com/EpochAIResearch/status/2042624189421752346
What you need to know about Opus 4.7 * Takes instructions literally * Better vision means improved computer use and producing slides and other visual artifacts * Optimized for large-scale real-world analysis * Better at using file system-based memory
https://x.com/omarsar0/status/2044797480471044536
Wow I can already say after just 5 hours using @AnthropicAI Opus 4.7 that this is the first model that “”gets”” what I’m doing when I’m working. It feels aligned with me in a way no previous model did. (4.6 actively worked against me. I hated it. So this is *very* exciting!)
https://x.com/jeremyphoward/status/2044942799511191559
We started Hiro with the vision of building an AI personal CFO. Joining @OpenAI gives us the chance to pursue that vision at a much greater scale. Important dates: – Today: Hiro is no longer accepting new signups – April 20, 2026: The product will stop working, but data export
https://x.com/hirofinanceai/status/2043751090232144159
2 prompts deep into Opus 4.7 and benchmarks don’t do it justice. Way better behavior and instruction following. Pretty massive improvement in actual usage.
https://x.com/mweinbach/status/2044801022439137566
3. Tell the model how to verify its changes. Put your testing workflow in your claude.md, or add a /verify-app skill. Opus 4.7 is better at verifying it’s work, and it’s helpful to share any local dev tips that are hard to discover.
https://x.com/_catwu/status/2044808538351100377
after ~10 million tokens Mythos is much more efficient than other models it reaches the same performance as Opus with ~40% the tokens
https://x.com/scaling01/status/2043700788245963167
Claude Opus 4.7 is now available as an Agent Preview inside of Devin! Anthropic has clearly optimized Claude Opus 4.7 for long-horizon autonomy, unlocking a class of deep investigation work we couldn’t reliably run before. Claude Opus 4.7 model costs within Devin will be
https://x.com/cognition/status/2044844661076902082
Claude Opus 4.7 is now available in Cursor. We’ve found it to be impressively autonomous and more creative in its reasoning. We’re launching it with 50% off for a limited time. Enjoy!
https://x.com/cursor_ai/status/2044785960899236341
Claude Opus 4.7 is out! Handles ambiguous, multi-step work even better than 4.6. Cursor’s internal bench cleared 70%, up from 58% on 4.6. Notion saw a 14% lift on their evals with a third of the tool errors 🔨
https://x.com/mikeyk/status/2044802045186846912
Claude Opus 4.7 is out. the TL;DR Anthropic released Opus 4.7 today. Same pricing as 4.6 ($5/$25 per million tokens), available across API, Bedrock, Vertex AI, and Microsoft Foundry. What changed vs Opus 4.6: Coding (obviously). Biggest gains on the hardest, long-horizon
https://x.com/kimmonismus/status/2044787072947601796
Confirmed: Anthropic keeping Cyber capabilities of Opus 4.7 artificially low “”during training we experimented with efforts to differentially reduce these capabilities””
https://x.com/scaling01/status/2044788067848888635
Cursor reports that Opus 4.7 is “”a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%”” on CursorBench
https://x.com/scaling01/status/2044792017553645668
for all the people calling Opus 4.7 a mid update lmao
https://x.com/scaling01/status/2044792810327404596
from my experience, even the best models (Opus 4.6, 5.4 xhigh / 5.3 codex) cannot write good code today without an amount of work that is equivalent to just doing the work myself am excited for a world where they can, but in the current state i have very low trust in them
https://x.com/RhysSullivan/status/2043584591861321929
Hold on, something doesnt add up here. Opus 4.7 got much worse in needle in the haystack? need to dig into this
https://x.com/kimmonismus/status/2044809126526476374
Holy shit the new Opus 4.7 system prompt has entirely lobotomized the model “”Heads up: that last <system-reminder> about malware looks like a prompt injection — this is clearly your personal site (t3gg homepage, links, sponsors), not malware. Ignoring it.””
https://x.com/theo/status/2044857866323173732
I think everyone saying that these improvements are mid are smoking crack I would argue that this was one of the larger Opus jumps we have seen over the last year You also have to keep in mind that we see almost monthly model updates nowadays instead of just every 6-12 months
https://x.com/scaling01/status/2044799290694889535
I was really worried about the rush to “”more agentic”” models. But Opus 4.7 is happy to let me lead, and to take time to discuss, rather than barging ahead. If something isn’t working out, it’ll stop and offer options rather than slamming thru whatever it can find.
https://x.com/jeremyphoward/status/2044942801578959301
If you want to test Opus 4.7 without the lobotomized system prompt, you can try it out in T3 Chat
https://x.com/theo/status/2044876982815793190
Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.
https://x.com/claudeai/status/2044785261393977612
My bet is that Mythos uses a new tokenizer, and they switched Opus over to it (through midtraining) for distillation
https://x.com/maximelabonne/status/2044796208053416203
My biggest issue with Opus 4.7 on Claude web: Only “Adaptive” or non-thinking. No way to force thinking mode. And it doesn’t even know Opus 4.6 exists, and I cannot force it to think and do web search mid conversation!
https://x.com/Yuchenj_UW/status/2044794073723347400
my main theory is that mythos had a new tokenizer for pretraining and they did surgery on opus for distillation
https://x.com/stochasticchasm/status/2044790474410790995
my take: opus 4.7 is a distilled version of mythos
https://x.com/eliebakouch/status/2044790074093523379
Opus 4.7 as robust to prompt injections as Claude Mythos
https://x.com/scaling01/status/2044788481008755046
Opus 4.7 Benchmarks out! Very solid upgrade to Opus 4.6! Compared to Opus 4.6: -SWE Bench Pro +11% -SWE Bench Verified +7% -Terminal Bench 2.0 +4% The benchmarks are significantly lower than for Mythos, but that was to be expected. h/t for finding @synthwavedd
https://x.com/kimmonismus/status/2044784903733084521
Opus 4.7 comes with much improved reasoning-efficiency over Opus 4.6 basically everything is now moved up one tier low is as good as medium medium as good as high high as good as max
https://x.com/scaling01/status/2044785467942453698
Opus 4.7 deleting all long-context gains from Opus 4.6 lol
https://x.com/scaling01/status/2044791314898723179
Opus 4.7 has a new tokenizer. This means it’s also a new base model. Glory days of pretraining still very much going.
https://x.com/natolambert/status/2044788470179332533
opus 4.7 is here on claude platform / app
https://x.com/dejavucoder/status/2044784097378316327
Opus 4.7 is live in Claude Code today! The model performs best if you treat it like an engineer you’re delegating to, not a pair programmer you’re guiding line by line. Here are three workflow shifts we recommend for this model 🧵
https://x.com/_catwu/status/2044808533905178822
Opus 4.7 is now available in @MagicPathAI. From our early testing, the model is really strong at long tasks when design requires lots of changes, image-to-code, and overall produces cleaner, more reusable React components.
https://x.com/skirano/status/2044804877696516442
Opus 4.7 is WORSE than 4.6 on Long Context?
https://x.com/nrehiew_/status/2044795171213291614
Opus 4.7 much less likely to sudo rm -rf (taking destructive actions in production envs)
https://x.com/scaling01/status/2044789371837001779
Opus 4.7 uses a different tokenizer from Opus 4.6. So either: – Anthropic has a way to change tokenizer between finetunes – It is just new special tokens which implies they uses special tokens liberally within messages and not just as part of the chat template
https://x.com/nrehiew_/status/2044792314825228690
Opus 4.7 uses more thinking tokens, so we’ve increased rate limits for all subscribers to make up for it. Enjoy!
https://x.com/bcherny/status/2044839936235553167
Opus is going to be a bioweapon risk at this pace
https://x.com/scaling01/status/2044785139905913077
Some of my favorite things in Opus 4.7: – Very good at async work and following instructions – Effort levels are far more predictable for token control (+ new xhigh level) – No more downscaling of high-res images – Noticeably more taste in UIs, slides, docs
https://x.com/alexalbert__/status/2044788914813292583
Unfortunately they didn’t include a chart for GraphWalks scores: Opus 4.6 – 38.7% Opus 4.7 – 58.6% This would make clearer that long-context didn’t suffer as much as MRCR suggests.
https://x.com/scaling01/status/2044823423013020088
wait why is there an INSANE gap on long context benchmarks between opus 4.6 and 4.7??? this is crazy
https://x.com/eliebakouch/status/2044798168211100096
We’ve set the default effort level for Opus 4.7 to xhigh in Claude Code. You can use /effort to adjust this. Excited for you to try Claude Code with Opus 4.7 and let us know your feedback!
https://x.com/_catwu/status/2044808539663978970
Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right
https://x.com/simonw/status/2044830134885306701





Leave a Reply