AI News #133: Week Ending April 17, 2026 with 50 Executive Summaries
About This Week’s Covers
This week’s cover celebrates a recent pop culture moment from Coachella, where Justin Bieber brought Billie Eilish on stage for “One Less Lonely Girl.”
There’s a nostalgic tradition when Justin sings “One Less Lonely Girl”: he brings someone up from the audience to sit on a stool and serenades them with the song.
At Coachella this month, Justin’s wife, Hailey, managed to get Billie Eilish out onto the stage for the end of One Less Lonely Girl, and Billie (a bonafide superstar herself and friends with the Biebers) sat on the stool and wholesomely lived out her childhood fan dream.
For this week’s cover, I took a picture of the moment swapped Billie and Justin out for Figure 3 robots. The Figure 3 robot is featured in today’s newsletter, sorting packages and handling logistics jobs. I used Google Gemini to swap them, and I like how Gemini was able to get the tattoos onto the Figure robot’s arm without turning it into a human. Gemini is great at inpainting and retaining key elements of an image while changing the targets.
I used the fonts from Justin’s album My World 2.0 and approximated the album cover text as best I could using Photoshop.
For the rest of the covers, I asked Opus 4.7, which was released this week, what it thought I should do. Opus suggested being campy and swapping out the microphone for something that symbolizes each category.
Opus gave me a JSON file that automatically generated 60 category covers via the Gemini API, and some of them are fantastic. Camp achieved.
Humanities Reading for The Week
This week, I threw together a Claude skill to pick a humanities reading to go with any theme I give it. I’m going to see how well it does, and to be honest, I think it will beat me fairly handily, even though I was an English major.
I gave the skill this week’s theme. The skill delivered five reading for me to choose from, and I went with Mary Oliver’s “Wild Geese“.
Humanities Counterpoint Ideas — One Less Lonely Girl Mary Oliver — “Wild Geese” (1986) Why it fits: The whole poem is about belonging arriving when you stop asking for it — “you do not have to be good,” you only have to let your soft animal love what it loves. Young Billie loving Bieber from her bedroom is the soft-animal version; the Coachella moment is the world calling her home. It also captures the healing moments of childhood pop-stars who grew up in public.
Runner-Up: Joni Mitchell — “The Circle Game” (1970) Why it fits: Mitchell’s song is the whole arc compressed — the child painted on a carousel, the years going round, the kid who looked up eventually looking back. The Bieber-Eilish moment is one full revolution of the circle game, made literal.
Wild Geese | Mary Oliver You do not have to be good. You do not have to walk on your knees for a hundred miles through the desert, repenting. You only have to let the soft animal of your body love what it loves. Tell me about despair, yours, and I will tell you mine. Meanwhile the world goes on. Meanwhile the sun and the clear pebbles of the rain are moving across the landscapes, over the prairies and the deep trees, the mountains and the rivers. Meanwhile the wild geese, high in the clean blue air, are heading home again. Whoever you are, no matter how lonely, the world offers itself to your imagination, calls to you like the wild geese, harsh and exciting– over and over announcing your place in the family of things. https://livelovesimple.com/wild-geese-mary-oliver/
This week, I organized 554 links. 131 of them informed the executive summaries.
I’ve taken a few top stories and moved them to the top for emphasis. In particular, there’s a lot of Anthropic news that’s important to float.
After those handful of major stories, I’ve organized everything by company in alphabetical order. There’s news from Adobe, Anthropic, Cloudflare, Google, Meta, Microsoft, Nvidia, OpenAI, Perplexity, Qwen, and some robotics news.
A few points to note from the week (before we get to the links):
AI pioneer Andrej Karpathy points out that there seems to be a growing gap in the understanding of AI capability. He calls out two particular vectors: One is how recently people have used a tool, and the other is which model version or tier of subscription they use.
The collective perception of AI is a complete cluster depending on experience, and oddly… no one is necessarily wrong… that’s what’s wild.
We each have legitimately different experiences when we interact with language models. Until everyone starts to get the same experiences, it’s going to be chaotic. The antidote is to try at least one subscription tier and get in there and play with it.
The top story of the week continues to be Mythos, Anthropic’s unreleased flagship model. It dominated the news last week, and again this week.
Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell summoned an emergency meeting with bank CEOs to review Mythos, based on the warnings from Anthropic and research from closed testing.
An internal Anthropic survey showed that 12 out of 18 employees think Mythos can now manage a day-long ambiguous task, and 8 out of 18 employees think Mythos can execute a week-long task.One-third of Anthropic employees think that entry-level engineers and researchers will be completely replaced within three months.
As far as publicly released Anthropic products, Anthropic has redesigned Claude Code on the desktop to enable parallel agents, which means teams of agents can work at the same time.
In a new software engineering benchmark, Claude Opus 4.6 was able to successfully re-implement a 16,000-line codebase that would have taken a human engineer several weeks.
Claude Opus 4.7 was released this week, and that would usually be the top story if Mythos didn’t exist. It is now in the top spot in agentic coding benchmarks and document reasoning.
Benchmarking company Epoch released a report showing that five companies now control two-thirds of the world’s computing power, up from 60% in 2024 to about 67% in 2026. Google, Microsoft, Meta, Amazon, and Oracle are the big five.
Google is adding powerful AI modes to Chrome, including reusable skills and saved prompts. The examples Google uses include transforming any recipe into a vegan option or generating shopping comparisons. Autonomous browser use and saved shortcuts are going to get incredibly powerful as Google starts to integrate skills and agency into Chrome.
A few weeks ago, ByteDance released SeedDance 2.0. It’s a very powerful video tool. I haven’t seen too much come out of it since. However, Ethan Mollick used his pretend BBC drama called Duckerton to demo Seedance, compared to OpenAI’s shuttered Sora engine. The prompt is “an elaborate Regency romance where everyone is wearing a live duck for a hat. Each duck is also wearing a hat, and a llama plays a flute in the background.” It’s a good prompt that is complicated and can test intricate compositions.
Legal AI provider Harvey has launched law firm AI agents. Several months ago, Harvey announced they’re essentially trying to take the jobs of entry-level lawyers. There was a marketing and promotional video that was jarring when I watched it because it wasn’t hiding the fact that they were coming after lower-level jobs. In fact, that was the selling point of the product.
And now, with 14 specialty legal agents, Harvey is aiming to replace entire teams of junior associates. Harvey has agents for: mergers and acquisitions, capital markets, asset management funds, real estate, private equity, commercial litigation, bankruptcy and restructuring, investigations and white collar, tax, IP, antitrust and competition, as well as in-house employment, labor, and regulatory compliance agents.
Microsoft’s new medical AI can detect cancer from a $10 tissue sample.
Sam Altman experienced an attack where someone threw a Molotov cocktail at his house in San Francisco. Sam posted a photo of his family to appeal to people’s empathy.
A personal finance startup called Hiro, is joining OpenAI with the vision of building a personal CFO.
For the first time I’m aware, two open models have cracked the top document-understanding leaderboards. Kimi Moonshot, at number 8, is the best open model, with Google’s Gemma at number 10.
In sobering news, Snap, aka Snapchat, is cutting 1,000 jobs and placing the rationale behind it on AI efficiency and the ability to replace human workers.
Here are the week’s top stories, along with all the links and images you need to explore them.
Alignment
The perception of AI capability gap widens between casual users and power users Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is https://x.com/karpathy/status/2042334451611693415
Anthropic’s unreleased Mythos model triggers job worries Internal Anthropic survey on Claude Mythos Preview 12/18 people thought that Mythos can manage day long ambiguous tasks 8/18 thought that it can execute week long tasks https://x.com/scaling01/status/2044787521691742338
This was… an interesting one. Reminder that we run independent evals on our cyber ranges that labs don’t have access to. Exploitation capabilities are getting seriously good. Mythos is the first model to complete our full 32-step corporate network attack sim E2E. https://x.com/ekinomicss/status/2043688793085992970
2. Give Claude Code your full task context upfront: goal, constraints, acceptance criteria in the first turn. This lets Claude Code do its best work. https://x.com/_catwu/status/2044808536790847693
Claude Opus 4.6 rebuilds16,000-line bioinformatics toolkit on new benchmark. Would have taken an engineer weeks. What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread. https://x.com/EpochAIResearch/status/2042624189421752346
Industry first: Anthropic releases new model while admitting better one exists internally First model from Anthropic, which openly acknowledges it isn’t the best model they have https://x.com/nrehiew_/status/2044791293080121553
Anthropic launched Claude Opus 4.7 today, the new #1 in our GDPval-AA benchmark for performance on agentic real-world work tasks Opus 4.7 scored 1753 on GDPval-AA at launch with its ‘max’ effort setting, surpassing GPT-5.4 xhigh. This is a significant upgrade, placing Opus back https://x.com/ArtificialAnlys/status/2044856740970402115
Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But “”reasoning about documents”” ≠ “”parsing documents for agents.”” We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3% https://x.com/llama_index/status/2044886527352647859
Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%). It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2. https://x.com/ValsAI/status/2044792518953533777
big jump in coding capabilities by Claude 4.7 Opus SWE-Bench Pro 64.3% SWE-Bench Verified 87.6% TerminalBench 69.4% but interestingly, I think they kept CyberGym scores artificially low https://x.com/scaling01/status/2044784563201708379
Claude Opus 4.7 is out! Benchmark scores look pretty strong, but clearly much worse than Mythos. It’s a nerfed Mythos, they deliberately reduced cyber capabilities during training. https://x.com/Yuchenj_UW/status/2044787564440334350
Opus 4.7 first-hour impressions Ran the canvas tree growth test twice. 4.6: nailed the animation both times 4.7: static tree, no growth animation — twice 4.7’s thinking is noticeably shorter and faster though (trimmed some 4.6 thinking in the clip for pacing). Not the upgrade https://x.com/stevibe/status/2044800069661254064
The new Opus 4.7 model places #1 on our Vibe Code Benchmark, at 71%. When we first released the benchmark 4.5 months ago, no model scored above 25%. This benchmark tests a model’s ability to create a fully functional web application from the ground up. https://x.com/ValsAI/status/2044791415524471099
We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench – our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results 🧑🔬: – Opus 4.7 is a general improvement https://x.com/jerryjliu0/status/2044902620746363016
What you need to know about Opus 4.7 * Takes instructions literally * Better vision means improved computer use and producing slides and other visual artifacts * Optimized for large-scale real-world analysis * Better at using file system-based memory https://x.com/omarsar0/status/2044797480471044536
Wow I can already say after just 5 hours using @AnthropicAI Opus 4.7 that this is the first model that “”gets”” what I’m doing when I’m working. It feels aligned with me in a way no previous model did. (4.6 actively worked against me. I hated it. So this is *very* exciting!) https://x.com/jeremyphoward/status/2044942799511191559
Figure
Figure’s Helix-02 robot: autonomous package sorting at scale Also the robot you are seeing here is doing a high-volume package logistic use case It’s doing it fully autonomous with Helix-02
Introducing Vulcan – a new Al robot controller
This is an important safety feature that allows F.03 to lose up to 3 actuators in the lower body and still hobble around without falling
Five companies now control two-thirds of global AI compute Five companies — Google, Microsoft, Meta, Amazon, and Oracle — now control about two-thirds of the world’s compute, up slightly from ~60% at the start of 2024. Many AI labs (including OpenAI and Anthropic) depend almost entirely on these hyperscalers for access to their compute. https://x.com/EpochAIResearch/status/2044154042541301870
Today we shipped a new Search experience in @googlechrome to help you explore the web without constant back and forth between tabs. Now, when you click a link from AI Mode, the website opens side-by-side. It’s a game changer for comparing details and products across sites, https://x.com/rmstein/status/2044828926057333050
We’re introducing a new Search experience in @GoogleChrome that lets you open webpages side-by-side with AI Mode – no tab switching required. Now, you’ll be able to compare details and ask follow-up questions while still maintaining the context of your search, whether you’re https://x.com/Google/status/2044832732274901489
Today, we’re introducing Skills in @GoogleChrome, a new way to build one-click workflows for your most frequently used AI prompts — like asking for ingredient substitutions to make a recipe vegan, generating side-by-side shopping comparisons across multiple tabs, or scanning long https://x.com/Google/status/2044106378655215625
Seedance 2.0 video examples Impressed that Seedance 2.0 can pull of “”a mech battle between Neanderthal and Homo Sapiens”” so well. (This is exactly what happened, historically) https://x.com/emollick/status/2043571794024489143
Harvey launches autonomous legal agents spanning fourteen practice areas Harvey Agents | Delegate the Work. Own the Judgment. https://www.harvey.ai/agents
Microsoft
Microsoft AI detects cancer from cheap, standard tissue slides Microsoft’s AI can now detect cancer from a $10 tissue sample. For context, every time tumor cells are tested, doctors create a basic microscope slide to study tissue up close. These slides show cell shapes and structures, but they can’t reveal which immune cells are actually https://x.com/rowancheung/status/2043709190515114008
Personal-finance AI startup Hiro shuts down, team joins OpenAI We started Hiro with the vision of building an AI personal CFO. Joining @OpenAI gives us the chance to pursue that vision at a much greater scale. Important dates: – Today: Hiro is no longer accepting new signups – April 20, 2026: The product will stop working, but data export https://x.com/hirofinanceai/status/2043751090232144159
OpenSource
Open-source models crack top ten of document AI leaderboard Document Arena update: four new models are reshaping the top ranks – including two open models! – #1 Claude Opus 4.6 Thinking is new, keeping @AnthropicAI in the top 3 – #8 Kimi-K2.5 Thinking by @Kimi_Moonshot now the best open model (Modified MIT) – #10 Gemma-4-31b by https://x.com/arena/status/2044437193205395458
Adobe launches conversational AI agent to power Creative Cloud workflows Adobe Ushers in a New Era of Creativity with New Creative Agent and Generative AI Innovations in Adobe Firefly https://news.adobe.com/news/2026/04/adobe-new-creative-agent
Anthropic tests Claude as automated AI alignment researcher New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Claude Opus 4.6 could accelerate research on a key alignment problem: using a weak AI model to supervise the training of a stronger one. https://x.com/AnthropicAI/status/2044138481790648323
give your agents a browser. Browser Run (fka Browser Rendering) really sprinted for Agents Week 🏃♀️ quick look at what shipped 1) Browser Rendering –> Browser Run (renamed!) 2) Live View – realtime view of browser sessions 3) Human in the Loop – intervene when your agent needs https://x.com/kathyyliao/status/2044479579382026484
Google DeepMind’s new robotics model controls Boston Dynamics’ Spot via spoken commands Gemini Robotics ER 1.6: Enhanced Embodied Reasoning — Google DeepMind https://deepmind.google/blog/gemini-robotics-er-1-6/
Instead of writing complex code, the team interacted with Spot using plain English. We built a bridge between Gemini Robotics ER and Spot’s system, giving the AI a basic set of tools to move freely, take photos, and grab things – enabling it to carry out more complex tasks. https://x.com/GoogleDeepMind/status/2044763631858909269
Robotics is making progress! 🤖 We just released @GoogleDeepMind Gemini Robotics-ER 1.6 for enhanced embodied reasoning. – Unlocks instrument reading capabilities for complex gauges and sight glasses. – Achieves 93% success on instrument reading tasks using agentic vision. – https://x.com/_philschmid/status/2044071114578509971
We teamed up with @BostonDynamics to power their robot Spot with Gemini Robotics embodied reasoning models. This means it can better understand its surroundings, identify objects and follow simple commands – like tidying up a room. https://x.com/GoogleDeepMind/status/2044763625680765408
We’re rolling out an upgrade designed to help robots reason about the physical world. 🤖 Gemini Robotics-ER 1.6 has significantly better visual and spatial understanding in order to plan and complete more useful tasks. Here’s why this is important 🧵 https://x.com/GoogleDeepMind/status/2044069878781390929
Google launches Gemini 3.1 Flash TTS with director-style voice controls Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵 https://x.com/GoogleDeepMind/status/2044447030353752349
Google’s new Gemini 3.1 Flash TTS ranks #2 on the Artificial Analysis Speech Arena Leaderboard, ahead of ElevenLabs’ Eleven v3 and only behind Inworld TTS 1.5 Max Gemini 3.1 Flash TTS represents a significant step forward for Google from previous TTS models, with notably https://x.com/ArtificialAnlys/status/2044450045190418673
Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API! https://x.com/OfficialLoganK/status/2044447596010435054
Our most expressive and steerable TTS model yet! Designed to give builders granular control over AI-generated speech, Gemini 3.1 Flash TTS is really fun to play with! Available in preview today – for devs via the Gemini API & @GoogleAIStudio + for enterprises on Vertex AI https://x.com/demishassabis/status/2044599020690010217
Google launches Gemini desktop app for macOS users Introducing Gemini on Mac. It’s the first time we’re bringing the @Geminiapp to desktop. The team built this initial release with @Antigravity, and it went from an idea to a native Swift app prototype in a few days. More features on the way! https://x.com/sundarpichai/status/2044452464724967550
Introducing Gemini on Mac. We heard your feedback. We recruited a small team. They built 100+ features in less than 100 days. 🤯 100% native Swift. Lightning fast. Let us know what you think! https://x.com/joshwoodward/status/2044452201947627709
The Gemini app is now on Mac. With this new desktop app, you can access Gemini from any screen with Option + Space and share your window to get answers based on the documents, code, or data you’re working on. https://x.com/GeminiApp/status/2044445911716090212
Researcher claims AI tool completed complex personal tax filing in an hour Ravid Shwartz Ziv on X: “I took the new Muse Spark to the ultimate test: filing my taxes – 3 different workplaces, consulting, stocks, foreign bank accounts and assets, and kids. One hour later, I had everything done. AGI is here… cc: @alexandr_wang” / X https://x.com/ziv_ravid/status/2044237898351030538
Multi-agent AI system speeds up NVIDIA CUDA kernels 38% We’ve been developing a multi-agent system that builds and maintains complex software autonomously. Recently, we partnered with NVIDIA to apply it to optimizing CUDA kernels. In 3 weeks, it delivered a 38% geomean speedup across 235 problems. https://x.com/cursor_ai/status/2044136953239740909
Nvidia releases 120B-parameter Nemotron 3 Super for efficient agentic reasoning Banger paper from NVIDIA. Agentic reasoning needs models that are not just capable, but efficient at long-context inference. The agent model layer is moving toward open, long-context, high-throughput architectures. This paper introduces Nemotron 3 Super, an open 120B parameter https://x.com/dair_ai/status/2044452957023047943
Nvidia’s Jensen Huang downplays Google TPU threat to GPU dominance I asked Jensen: “2 out of the top 3 models in the world, Claude and Gemini, were trained on TPU. What does that mean for Nvidia going forward?” After a long technical back and forth about what the right accelerator for AI looks like (see full episode), Jensen lays down the https://x.com/dwarkesh_sp/status/2044468295957635392
NVIDIA’s Lyra 2.0 builds persistent, explorable 3D worlds Today, we released Lyra 2.0, a framework for generating persistent, explorable 3D worlds at scale, from NVIDIA Research. Generating large-scale, complex environments is difficult for AI models. Current models often “forget” what spaces look like and lose track of movement over https://x.com/NVIDIAAIDev/status/2044445645109436672
OpenAI
GPT-5.4 Pro cracks decades-old Erdős Problem #1196 with ‘elegant proof’ This is becoming a pattern in AI that makes talking about capabilities challenging. First, there are overstated claims (like the flubbed Erdos problems last year), then minor wins (AI helps with discovery) then breakthroughs. The first stage feels like (& often is) hype, but… https://x.com/emollick/status/2044455311118074124
GPT-5.4 Pro solves Erdős Problem #1196! Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman’s comments below). Formalisation is underway! https://x.com/Liam06972452/status/2044051379916882067
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. https://x.com/jdlichtman/status/2044298382852927894
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. https://x.com/jdlichtman/status/2044298382852927894?s=20
More on GPT-5.4 Pro’s latest mathematical contribution: “The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention.” https://x.com/gdb/status/2044436998648193333
Paul Erdos had a concept of “”Proofs from The Book””, meaning that the argument is so compact and elegant that this is the proof God would’ve written down in “”The Book.”” After reading the GPT5.4 proof of Erdos #1196, I would say this is a Book Proof of the result. The conjecture https://x.com/jdlichtman/status/2044307082275618993
OpenAI expands Codex into a Mac-controlling AI super app OpenAI just dropped a major Codex update, one hour after Anthropic’s Opus 4.7. Whats new: background computer use on macOS (Codex clicks and types on your Mac while you keep working), in-app browser, image generation via gpt-image-1.5, persistent memory, long-running https://x.com/kimmonismus/status/2044832303075995994
Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://x.com/OpenAI/status/2044827705406062670
Personal Computer turns Mac mini into always-on AI agent When set up on a Mac mini, Personal Computer can run 24/7 in the background across all your apps and files. Start a task from your iPhone, and Personal Computer can operate on your desktop and local files using 2FA. Requires the latest iOS update from the App Store. https://x.com/perplexity_ai/status/2044806021244497964
Qwen
Alibaba’s Qwen3.6 matches rivals using a fraction of compute ⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes https://x.com/Alibaba_Qwen/status/2044768734234243427
LM Performance:Qwen3.6-35B-A3B outperforms the dense 27B-param Qwen3.5-27B on several key coding benchmarks and dramatically surpasses its direct predecessor Qwen3.5-35B-A3B, especially on agentic coding and reasoning tasks. https://x.com/Alibaba_Qwen/status/2044768738294268199
VLM Performance:Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance https://x.com/Alibaba_Qwen/status/2044768742761189762
Alibaba released Qwen3.6-35B-A3B today. Big jump compared to Qwen 3.5-35B model. It’s a sparse MoE, 35B total params, only 3B active. Natively multimodal, thinking and non-thinking modes. Hardfacts: SWE-bench Verified: 73.4, near dense Qwen3.5-27B (75.0), way ahead of https://x.com/kimmonismus/status/2044780695361290347
Robotics
Kia sets 2028 timeline for Atlas humanoids in Georgia plant Kia’s CEO announced a phased roadmap for deploying the Boston Dynamics Atlas humanoid: – 2028: Full-scale deployment of Atlas at Hyundai Motor Group Metaplant America (HMGMA) in Georgia. – Second half of 2029: Expansion to Kia AutoLand (Georgia) and other global Group https://x.com/TheHumanoidHub/status/2042231158889759160
Skild AI’s general-purpose robot brain self-corrects during cooking task Skild Brain preparing an omelet with everyday human tools. The robot drops an eggshell into the bowl at one point but recovers and continues the task. The ability to self-correct during edge cases is what will make robots dependable for complex, long-horizon missions. https://x.com/TheHumanoidHub/status/2044492735420502300
Automated Executive Summaries with The Same Links, Generated by Claude Haiku 4.5 (I test it every week to see how it does)
Treasury Secretary and Fed Chair signal policy concerns over Claude’s capabilities. U.S. financial regulators called an emergency meeting with major bank leaders to discuss Anthropic’s latest AI model, suggesting policymakers see specific risks that warrant immediate inter-agency coordination with the private sector. The unprecedented summons of both Treasury and Federal Reserve leadership indicates this goes beyond routine tech monitoring—either the model’s capabilities crossed a threshold regulators flagged as systemically important, or its deployment could affect financial stability and banking operations. This represents a rare convergence of monetary policy and financial supervision around a single commercial AI product.
Internal Anthropic survey finds Mythos preview shows mixed results on sustained task performance. In an internal survey of 18 Anthropic employees, two-thirds believed Claude’s Mythos preview could manage day-long ambiguous tasks, but only 44 percent thought it could handle week-long assignments. More strikingly, nearly one-third of respondents assessed that Mythos could replace entry-level engineers and researchers within three months—a significant but not overwhelming consensus on near-term job displacement risk. The findings suggest Mythos shows capability gains but with notable limitations in task duration and uncertain real-world employment impact.
Internal Anthropic survey on Claude Mythos Preview 12/18 people thought that Mythos can manage day long ambiguous tasks 8/18 thought that it can execute week long tasks https://x.com/scaling01/status/2044787521691742338
Mythos model can autonomously attack poorly defended networks; gaps in AI security assessments grow. Anthropic’s Claude Mythos Preview succeeded in executing a full 32-step corporate network attack in tests—a feat no prior AI model achieved—raising real concerns about autonomous cyber threats. The AI Security Institute’s evaluation shows Mythos completed complex multi-stage attacks that would take human experts days of work, though the tests lacked real-world defenses like active monitoring. Simultaneously, Anthropic confirmed briefing the Trump administration on Mythos’s capabilities while facing a Pentagon lawsuit over military AI access restrictions, highlighting tensions between national security oversight and private AI development.
This was… an interesting one. Reminder that we run independent evals on our cyber ranges that labs don’t have access to. Exploitation capabilities are getting seriously good. Mythos is the first model to complete our full 32-step corporate network attack sim E2E. https://x.com/ekinomicss/status/2043688793085992970
Claude Code desktop redesign for parallel agents in collaborative workflows Claude launches a redesigned desktop application enabling developers to orchestrate multiple AI-powered coding tasks simultaneously, with integrated terminal, file editor, and enhanced diff viewing—shifting agentic work from sequential to parallel execution where developers act as orchestrators reviewing and steering results in real time.
2. Give Claude Code your full task context upfront: goal, constraints, acceptance criteria in the first turn. This lets Claude Code do its best work. https://x.com/_catwu/status/2044808536790847693
I appreciate you providing the material, but I need to flag a significant issue: the image file you’ve included appears to be corrupted or improperly formatted (it contains binary JPEG data that didn’t render). I cannot extract readable text from it. However, I **can** work with the text snippet you provided at the top: — Claude Opus successfully reimplements massive bioinformatics software in weeks. Anthropic’s Claude Opus 4.6 completed a full reimplementation of a 16,000-line bioinformatics toolkit—a task estimated to require weeks of human engineering—establishing new benchmarks for AI-assisted large-scale software development with the MirrorCode evaluation framework. — **To provide a complete, accurate summary**, could you please: 1. Re-upload the image clearly, or 2. Paste the article text directly as plain text? This will ensure I deliver a factual, punchy two-line summary that meets your editorial standards.
What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread. https://x.com/EpochAIResearch/status/2042624189421752346
Claude 3 Haiku: smaller model trades capability for speed and cost. Anthropic released Claude 3 Haiku, its smallest and fastest model, while publicly stating it isn’t their most powerful offering—a deliberate strategy to serve different use cases rather than chase benchmark rankings. The move reflects a maturing AI market where companies optimize for real-world tradeoffs (speed, affordability, accuracy) rather than raw performance, and Anthropic’s willingness to be transparent about model limitations signals confidence in having better alternatives for demanding tasks.
Claude Opus 4.7 achieves top performance on agentic work, surpassing competitors in real-world task execution. Anthropic’s Claude Opus 4.7 takes the number-one spot on its GDPval-AA benchmark for agentic work—tasks requiring agents to plan and execute autonomously in the real world. The model scores 1,753 at maximum effort, outperforming previous leaders. Beyond raw capability, users report it handles extended workflows reliably: it catches logical errors during planning, follows instructions precisely, and verifies its own outputs before reporting results. On document reasoning, Opus 4.7 jumped from 57.1% to 80.6%; on chart parsing, gains reached +42.3 percentage points. These improvements matter because they translate to fewer errors in production systems where agents run autonomously without human supervision.
Anthropic launched Claude Opus 4.7 today, the new #1 in our GDPval-AA benchmark for performance on agentic real-world work tasks Opus 4.7 scored 1753 on GDPval-AA at launch with its ‘max’ effort setting, surpassing GPT-5.4 xhigh. This is a significant upgrade, placing Opus back https://x.com/ArtificialAnlys/status/2044856740970402115
Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But “”reasoning about documents”” ≠ “”parsing documents for agents.”” We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3% https://x.com/llama_index/status/2044886527352647859
Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%). It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2. https://x.com/ValsAI/status/2044792518953533777
big jump in coding capabilities by Claude 4.7 Opus SWE-Bench Pro 64.3% SWE-Bench Verified 87.6% TerminalBench 69.4% but interestingly, I think they kept CyberGym scores artificially low https://x.com/scaling01/status/2044784563201708379
Claude Opus 4.7 is out! Benchmark scores look pretty strong, but clearly much worse than Mythos. It’s a nerfed Mythos, they deliberately reduced cyber capabilities during training. https://x.com/Yuchenj_UW/status/2044787564440334350
Opus 4.7 first-hour impressions Ran the canvas tree growth test twice. 4.6: nailed the animation both times 4.7: static tree, no growth animation — twice 4.7’s thinking is noticeably shorter and faster though (trimmed some 4.6 thinking in the clip for pacing). Not the upgrade https://x.com/stevibe/status/2044800069661254064
The new Opus 4.7 model places #1 on our Vibe Code Benchmark, at 71%. When we first released the benchmark 4.5 months ago, no model scored above 25%. This benchmark tests a model’s ability to create a fully functional web application from the ground up. https://x.com/ValsAI/status/2044791415524471099
We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench – our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results 🧑🔬: – Opus 4.7 is a general improvement https://x.com/jerryjliu0/status/2044902620746363016
What you need to know about Opus 4.7 * Takes instructions literally * Better vision means improved computer use and producing slides and other visual artifacts * Optimized for large-scale real-world analysis * Better at using file system-based memory https://x.com/omarsar0/status/2044797480471044536
Wow I can already say after just 5 hours using @AnthropicAI Opus 4.7 that this is the first model that “”gets”” what I’m doing when I’m working. It feels aligned with me in a way no previous model did. (4.6 actively worked against me. I hated it. So this is *very* exciting!) https://x.com/jeremyphoward/status/2044942799511191559
I need more substantive material to summarize. The snippet provided lacks key details: what company built this robot, what specific results were achieved, when did this happen, and what makes Helix-02 distinctive? Please provide the full article, press release, or source material so I can write an accurate two-line summary with concrete facts, impact assessment, and evidence.
Five hyperscalers now control two-thirds of global AI compute capacity Google, Microsoft, Meta, Amazon, and Oracle have increased their combined control of computing power for AI systems to roughly 67% as of mid-2024, up from 60% at the start of the year. This concentration matters because leading AI labs like OpenAI and Anthropic depend almost entirely on these five companies for infrastructure access, creating a potential bottleneck for AI development and raising questions about who can afford to build advanced AI systems.
Five companies — Google, Microsoft, Meta, Amazon, and Oracle — now control about two-thirds of the world’s compute, up slightly from ~60% at the start of 2024. Many AI labs (including OpenAI and Anthropic) depend almost entirely on these hyperscalers for access to their compute. https://x.com/EpochAIResearch/status/2044154042541301870
Google Chrome adds side-by-side web browsing with integrated AI assistance. Google has rolled out updates to AI Mode in Chrome that let users open websites alongside AI search results without switching tabs, plus a new “Skills” feature that saves frequently-used AI prompts as one-click tools. The update matters because it addresses a real friction point in web research—the constant tab-switching—while also democratizing AI automation by letting everyday users build custom workflows for tasks like comparing products or analyzing recipes. Early testers reportedly stayed more focused on their tasks, and the feature is now live in the US with global expansion planned.
Today we shipped a new Search experience in @googlechrome to help you explore the web without constant back and forth between tabs. Now, when you click a link from AI Mode, the website opens side-by-side. It’s a game changer for comparing details and products across sites, https://x.com/rmstein/status/2044828926057333050
We’re introducing a new Search experience in @GoogleChrome that lets you open webpages side-by-side with AI Mode – no tab switching required. Now, you’ll be able to compare details and ask follow-up questions while still maintaining the context of your search, whether you’re https://x.com/Google/status/2044832732274901489
Today, we’re introducing Skills in @GoogleChrome, a new way to build one-click workflows for your most frequently used AI prompts — like asking for ingredient substitutions to make a recipe vegan, generating side-by-side shopping comparisons across multiple tabs, or scanning long https://x.com/Google/status/2044106378655215625
Text generation model produces historically plausible visual narratives consistently. Seedance 2.0 demonstrates improved ability to generate coherent, historically grounded imagery across multiple attempts—the model reliably rendered a Neanderthal-versus-Homo-sapiens scenario with appropriate visual details rather than producing inconsistent or anachronistic results. This matters because consistency in AI-generated content suggests the system has developed better understanding of historical context, moving beyond random recombination of training data. The key evidence is the model’s repeated success on the same prompt without degradation, indicating advances in how visual AI systems encode and apply factual constraints.
Impressed that Seedance 2.0 can pull of “”a mech battle between Neanderthal and Homo Sapiens”” so well. (This is exactly what happened, historically) https://x.com/emollick/status/2043571794024489143
Harvey deploys AI agents to automate legal work while lawyers retain final decision-making. Harvey’s AI agents execute multi-step legal tasks—research, document review, drafting, and formatting—across specialties from M&A to tax, returning polished work for lawyer review rather than replacing judgment calls. The model aims to free attorneys from routine execution to focus on strategy and client decisions, with all outputs grounded in source documents and editable before approval.
AI system spots cancer markers in cheap tissue samples. Microsoft developed an AI tool that identifies cancer from standard $10 microscope slides—the routine tissue samples doctors already prepare for every tumor patient. The system analyzes cell structures to detect immune markers that typically require expensive lab tests, potentially making cancer diagnosis faster and more accessible globally.
Microsoft’s AI can now detect cancer from a $10 tissue sample. For context, every time tumor cells are tested, doctors create a basic microscope slide to study tissue up close. These slides show cell shapes and structures, but they can’t reveal which immune cells are actually https://x.com/rowancheung/status/2043709190515114008
OpenAI CEO Sam Altman’s home attacked with firebomb, suspect arrested. A 20-year-old man was arrested after throwing a Molotov cocktail at Sam Altman’s San Francisco home early Friday morning, with the device bouncing off the metal gate and causing no injuries. The same suspect then made threats at OpenAI’s headquarters and was apprehended by police. Altman responded with a blog post acknowledging the power of words and narratives in fueling tensions around AI, calling for de-escalation of rhetoric in debates over the technology’s societal impact.
OpenAI commits thirty billion dollars to Cerebras chip servers over three years OpenAI agreed to spend more than $20 billion on Cerebras computing servers, with potential total spending reaching $30 billion, while gaining a potential 10% equity stake and providing $1 billion for data center development. The deal underscores the immense infrastructure costs required to power AI systems at scale—specifically the computational processes that generate AI responses—and positions Cerebras for an anticipated public offering this quarter at a $35 billion valuation. This represents a significant bet by OpenAI on an alternative to Nvidia’s dominant chip position in AI infrastructure.
Hiro personal finance AI shutters after OpenAI acquisition deal. Hiro, an AI tool that managed personal finances for users, is closing down after its founders joined OpenAI, suggesting the company absorbed the team rather than the product. The shutdown—effective April 2026—highlights how acquisition talent grabs can eliminate consumer-facing AI tools, raising questions about whether acquired AI startups survive as standalone products or dissolve into larger organizations pursuing different priorities.
We started Hiro with the vision of building an AI personal CFO. Joining @OpenAI gives us the chance to pursue that vision at a much greater scale. Important dates: – Today: Hiro is no longer accepting new signups – April 20, 2026: The product will stop working, but data export https://x.com/hirofinanceai/status/2043751090232144159
I cannot produce the requested summary because the provided material is corrupted image data (JPEG file encoded as text) rather than readable text content. To create a proper two-line executive summary, I would need the actual article text describing the Document Arena rankings update, the new models mentioned (Claude Opus 4.6 Thinking, Kimi-K2.5 Thinking, Gemma-4-31b), and what makes these developments significant. **Please provide the readable article text, and I’ll deliver the summary in the exact format requested.**
Document Arena update: four new models are reshaping the top ranks – including two open models! – #1 Claude Opus 4.6 Thinking is new, keeping @AnthropicAI in the top 3 – #8 Kimi-K2.5 Thinking by @Kimi_Moonshot now the best open model (Modified MIT) – #10 Gemma-4-31b by https://x.com/arena/status/2044437193205395458
Snap cuts 16 percent of workforce while doubling down on AI efficiency. CEO Evan Spiegel announced layoffs affecting roughly 1,000 employees and 300+ open positions, targeting $500 million in annual cost reductions and a path to profitability. The company explicitly frames AI tools as enabling smaller teams to work faster on priorities like ad performance and infrastructure, signaling a strategic shift toward automation over headcount as a cost-control mechanism.
Adobe launches conversational AI that automates complex creative tasks across its apps. Adobe introduced Firefly AI Assistant, an AI agent that lets creators describe desired outcomes and automatically orchestrates multi-step workflows across Photoshop, Premiere, Lightroom and other Creative Cloud tools in one conversation. The announcement matters because it shifts the boundary of AI creative assistance from single-tool enhancement to full-scale project orchestration while keeping creators in control. Adobe also expanded Firefly with 30+ AI video and image models, precision editing tools, and studio-quality audio capabilities.
Anthropic executive quits Figma board as AI lab enters design market. Mike Krieger, Anthropic’s chief product officer, resigned from Figma’s board after the AI company announced design tools in its upcoming Opus 4.7 model that would compete directly with Figma’s core product. The move crystallizes investor fears that large AI labs will cannibalize established software businesses—a concern that has depressed software stocks significantly this year—though Figma’s stock actually rose 5% after the disclosure, suggesting markets remain uncertain whether AI companies can truly match specialized software incumbents.
Anthropic tests AI system to guide training of smarter AI models. Anthropic’s Claude Opus 4.6 successfully assisted with a core challenge in AI safety: using weaker models to oversee stronger ones during training. This matters because as AI systems grow more powerful, humans need reliable methods to keep them aligned with intended behavior—and automating that oversight could scale safety practices faster than manual approaches alone.
New Anthropic Fellows research: developing an Automated Alignment Researcher. We ran an experiment to learn whether Claude Opus 4.6 could accelerate research on a key alignment problem: using a weak AI model to supervise the training of a stronger one. https://x.com/AnthropicAI/status/2044138481790648323
Cloudflare launches Browser Run to give AI agents direct web control. Cloudflare renamed and upgraded its Browser Rendering product to Browser Run, a service that lets AI agents autonomously navigate websites, fill forms, and extract data at scale. New features include real-time visibility into agent actions, human intervention when agents get stuck, direct access to Chrome’s control protocol for maximum flexibility, and support for popular AI coding tools like Claude Desktop. The upgrade addresses a practical gap: as AI agents become more capable, they need reliable infrastructure to interact with the web independently.
give your agents a browser. Browser Run (fka Browser Rendering) really sprinted for Agents Week 🏃♀️ quick look at what shipped 1) Browser Rendering –> Browser Run (renamed!) 2) Live View – realtime view of browser sessions 3) Human in the Loop – intervene when your agent needs https://x.com/kathyyliao/status/2044479579382026484
Gemma 4 outperforms Claude in key reasoning benchmarks, sparking user migrations. Google’s latest Gemma 4 model has begun pulling users away from Anthropic’s Claude, particularly among developers using OpenClaw tools, because it delivers stronger performance on complex reasoning and coding tasks at a competitive price point. The shift signals that open-source and smaller models are closing the gap with established leaders, giving enterprises more options for cost-effective AI deployment without sacrificing capability.
Google DeepMind releases robotics AI model with practical real-world skills. Google DeepMind unveiled Gemini Robotics ER 1.6, an AI model that helps robots understand their physical surroundings and execute complex tasks through visual and spatial reasoning. The system was tested with Boston Dynamics’ Spot robot, which can now interpret instrument gauges, identify objects, and complete commands like tidying rooms by processing simple English instructions—marking a shift toward more practical, language-guided robot autonomy.
Instead of writing complex code, the team interacted with Spot using plain English. We built a bridge between Gemini Robotics ER and Spot’s system, giving the AI a basic set of tools to move freely, take photos, and grab things – enabling it to carry out more complex tasks. https://x.com/GoogleDeepMind/status/2044763631858909269
Robotics is making progress! 🤖 We just released @GoogleDeepMind Gemini Robotics-ER 1.6 for enhanced embodied reasoning. – Unlocks instrument reading capabilities for complex gauges and sight glasses. – Achieves 93% success on instrument reading tasks using agentic vision. – https://x.com/_philschmid/status/2044071114578509971
We teamed up with @BostonDynamics to power their robot Spot with Gemini Robotics embodied reasoning models. This means it can better understand its surroundings, identify objects and follow simple commands – like tidying up a room. https://x.com/GoogleDeepMind/status/2044763625680765408
We’re rolling out an upgrade designed to help robots reason about the physical world. 🤖 Gemini Robotics-ER 1.6 has significantly better visual and spatial understanding in order to plan and complete more useful tasks. Here’s why this is important 🧵 https://x.com/GoogleDeepMind/status/2044069878781390929
Google releases text-to-speech model with granular vocal control. Google launched Gemini 3.1 Flash TTS, a speech synthesis model that lets developers direct vocal style, pace, and tone through embedded text commands rather than technical parameters. The model ranks second on industry benchmarks, supports 70+ languages, and includes SynthID watermarking to identify AI-generated audio—addressing both creative control and misinformation concerns as speech synthesis becomes more realistic and widely deployed.
Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵 https://x.com/GoogleDeepMind/status/2044447030353752349
Google’s new Gemini 3.1 Flash TTS ranks #2 on the Artificial Analysis Speech Arena Leaderboard, ahead of ElevenLabs’ Eleven v3 and only behind Inworld TTS 1.5 Max Gemini 3.1 Flash TTS represents a significant step forward for Google from previous TTS models, with notably https://x.com/ArtificialAnlys/status/2044450045190418673
Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API! https://x.com/OfficialLoganK/status/2044447596010435054
Our most expressive and steerable TTS model yet! Designed to give builders granular control over AI-generated speech, Gemini 3.1 Flash TTS is really fun to play with! Available in preview today – for devs via the Gemini API & @GoogleAIStudio + for enterprises on Vertex AI https://x.com/demishassabis/status/2044599020690010217
Google launches native Gemini app for Mac computers. Google has released a free native macOS version of its Gemini AI assistant, allowing users to summon it with a keyboard shortcut (Option + Space) without switching windows. The app lets users share their screen directly with Gemini to get instant summaries and help on documents and files they’re actively working on, marking a shift toward integrating AI assistance into everyday desktop workflows rather than requiring users to visit a separate interface.
Introducing Gemini on Mac. It’s the first time we’re bringing the @Geminiapp to desktop. The team built this initial release with @Antigravity, and it went from an idea to a native Swift app prototype in a few days. More features on the way! https://x.com/sundarpichai/status/2044452464724967550
Introducing Gemini on Mac. We heard your feedback. We recruited a small team. They built 100+ features in less than 100 days. 🤯 100% native Swift. Lightning fast. Let us know what you think! https://x.com/joshwoodward/status/2044452201947627709
The Gemini app is now on Mac. With this new desktop app, you can access Gemini from any screen with Option + Space and share your window to get answers based on the documents, code, or data you’re working on. https://x.com/GeminiApp/status/2044445911716090212
Google transforms NotebookLM into visual research workspace with Canvas feature Google is testing Canvas, a visualization tool that converts source documents into interactive timelines, web pages, and other visual formats directly within NotebookLM. The update also includes Connectors to link external data sources and labeling tools for organizing large document libraries, shifting NotebookLM from a summarization assistant toward a comprehensive research and content-creation platform. This moves Google’s AI tools closer to functioning as central hubs for workplace productivity rather than isolated applications.
Google builds desktop agent tool to rival Anthropic’s Cowork platform. Google is testing an “Agent” tab in Gemini Enterprise that moves beyond simple chatbots into task execution, featuring goal-setting, tool integration, file access, and a “require human review” toggle. The workspace design mirrors competitors like Claude’s Cowork, suggesting Google is building a multi-step workflow system for business users—a distinctive shift toward desktop-level automation rather than web-based chat. This appears part of a broader platform expansion ahead of Google I/O, positioning Gemini directly against Anthropic and OpenAI’s agent-focused products.
Google embeds shopping cart and checkout directly into Gemini Google is testing a built-in Shopping Cart feature in its Gemini AI assistant, allowing users to browse and purchase products without leaving the app—extending its Universal Commerce Protocol announced in January 2026. This development, combined with newly launched Skills (reusable automation workflows) and auto-browsing capabilities, suggests Google is building Gemini into a desktop application that rivals ChatGPT and emerging AI browsers by integrating shopping, web browsing, and task automation in one interface. The features remain unreleased but are expected to be formally announced at Google I/O on May 19–20.
AI tool handles complex multi-source tax filing in one hour. Muse Spark, a new AI application, completed a complicated tax return involving three employers, freelance income, investments, and international accounts in roughly 60 minutes—a task that typically requires hours of manual work or professional help. While the claim invites skepticism (and the poster disclaims it’s not tax advice), it suggests AI is moving beyond simple tasks into knowledge work that demands cross-referencing multiple income sources and regulatory rules.
Ravid Shwartz Ziv on X: “I took the new Muse Spark to the ultimate test: filing my taxes – 3 different workplaces, consulting, stocks, foreign bank accounts and assets, and kids. One hour later, I had everything done. AGI is here… cc: @alexandr_wang” / X https://x.com/ziv_ravid/status/2044237898351030538
OpenAI Stargate executives defect to lead Meta’s new compute division. Key OpenAI personnel involved in Stargate—the company’s massive AI infrastructure project—are joining Meta to oversee a new internal unit focused on data center operations and AI compute capacity. This poaching reflects intensifying competition among tech giants to acquire talent and accelerate AI development, signaling that building computational infrastructure has become as critical to AI leadership as algorithm breakthroughs themselves.
Microsoft developing locally-running AI agent for enterprise 365 tasks Microsoft is building an autonomous agent inspired by OpenClaw that would run directly on users’ computers to handle complex, multi-step tasks within its 365 software suite—differentiating itself from existing cloud-based Copilot tools through stronger security controls. The move signals Microsoft’s intent to compete with the increasingly popular open-source OpenClaw project while maintaining enterprise-grade safeguards, with the company expected to unveil details at its June Build conference.
Microsoft takes over OpenAI’s failed Norwegian AI data center project Microsoft has leased 30,000 specialized processors at a Norwegian Arctic data center that OpenAI originally planned to build but abandoned due to high energy costs and regulatory hurdles. The deal expands Microsoft’s existing $6.2 billion footprint at the Narvik facility, which uses renewable hydropower to support AI computing while reducing carbon emissions—positioning Europe to reduce dependence on non-EU infrastructure for advanced AI workloads.
AI agents autonomously optimize graphics computing code at scale. A multi-agent system developed for autonomous software maintenance delivered a 38% performance improvement across 235 graphics optimization tasks in just three weeks when applied to NVIDIA’s CUDA programming framework. This demonstrates that AI systems can now handle specialized, performance-critical code optimization—traditionally requiring scarce human expertise—suggesting meaningful productivity gains for computing-intensive industries like AI training and scientific research.
We’ve been developing a multi-agent system that builds and maintains complex software autonomously. Recently, we partnered with NVIDIA to apply it to optimizing CUDA kernels. In 3 weeks, it delivered a 38% geomean speedup across 235 problems. https://x.com/cursor_ai/status/2044136953239740909
NVIDIA open-sources massive model for long-reasoning AI agents. NVIDIA released Nemotron 3 Super, a 120-billion-parameter open model designed specifically for AI agents that need to process and reason over very long documents or conversation histories efficiently. The move signals a shift in the industry toward making powerful reasoning models openly available rather than proprietary, which matters because businesses building AI agents now have alternatives to expensive closed systems for handling complex, multi-step tasks.
Banger paper from NVIDIA. Agentic reasoning needs models that are not just capable, but efficient at long-context inference. The agent model layer is moving toward open, long-context, high-throughput architectures. This paper introduces Nemotron 3 Super, an open 120B parameter https://x.com/dair_ai/status/2044452957023047943
NVIDIA’s quantum AI models aim to accelerate practical quantum computing breakthroughs. NVIDIA released Ising, open-source AI models designed to help researchers develop quantum computers that actually solve real-world problems rather than remaining theoretical. The move matters because quantum computing has long promised revolutionary capabilities but remains stuck at small scales; AI-assisted design could significantly speed up the engineering challenges. NVIDIA is positioning itself as essential infrastructure for the quantum computing era by providing free tools that lower barriers for competitors and researchers alike.
Competing chip makers now train top AI models, challenging Nvidia’s dominance. Google and Anthropic’s choice to train Claude and Gemini on Google’s TPU chips—rather than Nvidia’s market-leading GPUs—signals that alternatives are becoming viable for cutting-edge AI development. This matters because Nvidia’s dominance in AI training hardware has been nearly absolute; viable competition could reshape a multibillion-dollar market and give companies more options for controlling costs and supply chains.
I asked Jensen: “2 out of the top 3 models in the world, Claude and Gemini, were trained on TPU. What does that mean for Nvidia going forward?” After a long technical back and forth about what the right accelerator for AI looks like (see full episode), Jensen lays down the https://x.com/dwarkesh_sp/status/2044468295957635392
NVIDIA releases framework for AI to generate and remember 3D worlds. NVIDIA’s Lyra 2.0 solves a specific problem: existing AI struggles to create large, consistent 3D environments because models forget spatial details and lose track of movement over time. This matters because persistent virtual worlds are essential for gaming, simulation, and metaverse applications—and the ability to generate them at scale could reduce production costs significantly. The framework represents a technical breakthrough in spatial consistency rather than general model improvement.
Today, we released Lyra 2.0, a framework for generating persistent, explorable 3D worlds at scale, from NVIDIA Research. Generating large-scale, complex environments is difficult for AI models. Current models often “forget” what spaces look like and lose track of movement over https://x.com/NVIDIAAIDev/status/2044445645109436672
AI solves century-old math problem by finding overlooked elegant proof GPT-5.4 Pro solved Erdős Problem #1196, a decades-old mathematical conjecture, with a proof so elegant it matches mathematicians’ ideal of mathematical beauty. The breakthrough matters because it demonstrates AI’s specific value in research: finding solutions humans overlooked not due to incapability but due to aesthetic bias and convention. This follows a pattern where initial AI hype eventually yields genuine discoveries, though distinguishing the two stages remains challenging for observers.
This is becoming a pattern in AI that makes talking about capabilities challenging. First, there are overstated claims (like the flubbed Erdos problems last year), then minor wins (AI helps with discovery) then breakthroughs. The first stage feels like (& often is) hype, but… https://x.com/emollick/status/2044455311118074124
GPT-5.4 Pro solves Erdős Problem #1196! Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman’s comments below). Formalisation is underway! https://x.com/Liam06972452/status/2044051379916882067
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. https://x.com/jdlichtman/status/2044298382852927894
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. https://x.com/jdlichtman/status/2044298382852927894?s=20
More on GPT-5.4 Pro’s latest mathematical contribution: “The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention.” https://x.com/gdb/status/2044436998648193333
Paul Erdos had a concept of “”Proofs from The Book””, meaning that the argument is so compact and elegant that this is the proof God would’ve written down in “”The Book.”” After reading the GPT5.4 proof of Erdos #1196, I would say this is a Book Proof of the result. The conjecture https://x.com/jdlichtman/status/2044307082275618993
OpenAI transforms Codex into an all-in-one developer workspace platform. OpenAI released a major Codex update enabling the tool to control your Mac in the background, integrate with 90+ apps and plugins, generate images, and remember user preferences—effectively expanding it from a coding assistant into a comprehensive development environment that can handle tasks spanning code writing, design, and project management. The move directly counters Anthropic’s similar push with Claude and represents a strategic shift across the AI industry toward unified desktop applications powered by autonomous agents rather than standalone tools.
OpenAI just dropped a major Codex update, one hour after Anthropic’s Opus 4.7. Whats new: background computer use on macOS (Codex clicks and types on your Mac while you keep working), in-app browser, image generation via gpt-image-1.5, persistent memory, long-running https://x.com/kimmonismus/status/2044832303075995994
Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://x.com/OpenAI/status/2044827705406062670
OpenAI launches specialized AI model to accelerate drug discovery research. OpenAI released GPT-Rosalind, a specialized AI system designed to help scientists move faster through the drug discovery process by synthesizing research literature, generating hypotheses, and planning experiments—work that typically takes 10-15 years from target discovery to approval. The model outperforms general-purpose AI on tasks involving molecules, proteins, and genes, and is being deployed with major pharmaceutical companies like Amgen and Moderna through a controlled access program that includes safeguards against misuse. This represents a shift toward domain-specific AI tools that address concrete bottlenecks in high-stakes scientific workflows rather than general capability improvements.
OpenAI launches cyber-specialized AI model for verified security defenders. OpenAI is expanding its Trusted Access for Cyber program to thousands of verified defenders and scaling a new GPT-5.4-Cyber model specifically fine-tuned for cybersecurity work with relaxed safety restrictions. The move matters because it addresses the dual reality that AI tools can accelerate both defenders finding vulnerabilities and attackers exploiting them—requiring OpenAI to deliberately match defensive capabilities with model advancement rather than waiting for risk thresholds to emerge. Evidence includes over 3,000 critical vulnerabilities fixed through their Codex Security tool in recent months and a $10M grant program supporting defenders across the ecosystem.
OpenAI exec warns staff that AI market is “as competitive as ever” OpenAI’s chief revenue officer circulated an internal memo emphasizing the need to lock in users and expand enterprise business to fend off rivals like Anthropic. The memo highlights a strategic shift toward building “moats”—features that make switching to competitors harder—rather than pursuing smaller projects, reflecting intensifying competition in the commercial AI market where customers rapidly shift between models based on performance.
OpenAI upgrades developer toolkit for autonomous AI agents. OpenAI released an updated Agents SDK that lets developers build AI agents capable of inspecting files, running commands, and editing code within secure sandbox environments—similar to giving an AI a controlled workspace with specific tools and instructions. The update is distinctive because it provides standardized, model-optimized infrastructure designed specifically for OpenAI’s systems, addressing a key developer need for safe, long-running autonomous tasks. This matters as businesses increasingly deploy AI agents for complex workflows; the sandboxed approach limits risk by containing what agents can access and execute.
I need the actual content or details about the TurboTax app in ChatGPT to write an accurate summary. The headline and link alone don’t provide enough information about what happened, why it matters, or what evidence supports it. Could you provide: – A brief description of what the TurboTax integration does – When this launched or was announced – Any relevant details about user access, features, or impact Once I have those details, I’ll produce the two-line executive summary.
Personal Computer AI agent operates continuously across Mac devices and files. Apple’s new Personal Computer feature runs as a persistent background agent on Mac mini hardware, able to access your desktop, local files, and complete tasks initiated from your iPhone—all while requiring two-factor authentication for security. This marks a shift toward always-on AI assistants with deep system access, rather than chat-based tools, and depends on users updating to the latest iOS version to function.
When set up on a Mac mini, Personal Computer can run 24/7 in the background across all your apps and files. Start a task from your iPhone, and Personal Computer can operate on your desktop and local files using 2FA. Requires the latest iOS update from the App Store. https://x.com/perplexity_ai/status/2044806021244497964
Alibaba open-sources efficient AI model rivaling much larger competitors Alibaba released Qwen3.6-35B-A3B, an open-source AI model using sparse architecture to activate only 3 billion of its 35 billion total parameters, dramatically reducing computational costs. The model matches or exceeds the performance of far larger competitors on coding and reasoning tasks—including achieving 73.4% on software engineering benchmarks—while supporting both images and text. This represents a significant efficiency breakthrough: the same capabilities at a fraction of the computing power required by dense models, making advanced AI more practical and accessible.
⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes https://x.com/Alibaba_Qwen/status/2044768734234243427
LM Performance:Qwen3.6-35B-A3B outperforms the dense 27B-param Qwen3.5-27B on several key coding benchmarks and dramatically surpasses its direct predecessor Qwen3.5-35B-A3B, especially on agentic coding and reasoning tasks. https://x.com/Alibaba_Qwen/status/2044768738294268199
VLM Performance:Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance https://x.com/Alibaba_Qwen/status/2044768742761189762
Alibaba released Qwen3.6-35B-A3B today. Big jump compared to Qwen 3.5-35B model. It’s a sparse MoE, 35B total params, only 3B active. Natively multimodal, thinking and non-thinking modes. Hardfacts: SWE-bench Verified: 73.4, near dense Qwen3.5-27B (75.0), way ahead of https://x.com/kimmonismus/status/2044780695361290347
Kia plans to deploy Boston Dynamics humanoid robots by 2028. Kia’s CEO outlined a concrete timeline for using Boston Dynamics’ Atlas robots in actual manufacturing plants, starting with Hyundai’s Georgia facility in 2028 and expanding to Kia’s facility by late 2029. This marks one of the first major commitments by a legacy automaker to deploy humanoid robots at scale in production environments, signaling growing confidence that the technology can handle real industrial work rather than remaining experimental.
Kia’s CEO announced a phased roadmap for deploying the Boston Dynamics Atlas humanoid: – 2028: Full-scale deployment of Atlas at Hyundai Motor Group Metaplant America (HMGMA) in Georgia. – Second half of 2029: Expansion to Kia AutoLand (Georgia) and other global Group https://x.com/TheHumanoidHub/status/2042231158889759160
Skild AI’s robot learns to recover from mistakes while cooking an omelet. The robot demonstrated real-time error correction—dropping an eggshell but continuing the task successfully—suggesting progress toward robots that can handle unpredictable real-world situations. This self-recovery capacity matters because previous robots typically failed completely when encountering unexpected problems, whereas handling minor mistakes mid-task is essential for robots to work reliably in homes and other uncontrolled environments.
Skild Brain preparing an omelet with everyday human tools. The robot drops an eggshell into the bowl at one point but recovers and continues the task. The ability to self-correct during edge cases is what will make robots dependable for complex, long-horizon missions. https://x.com/TheHumanoidHub/status/2044492735420502300
Leave a Reply