About This Week’s Covers

This week’s cover is a tribute to the 35th anniversary of Public Enemy’s classic album, “Fear of a Black Planet.” To make it relevant to artificial intelligence, I changed the theme to fear of the paper clip maximizer. The paper clip maximizer is a popular ethical warning where an innocent request to an artificial intelligence system wipes out humanity. A user wants to maximize the efficiency of creating paper clips and it results in humanity’s destruction.

I used ChatGPT to transfer the album style to a new image. I gave it the album cover and added the prompt: “Please take this classic album cover and turn it into a 16:9 cover for my AI newsletter. Keep the image as close as you can to the original but change the words from ‘Fear of a Black Planet’ to ‘Fear of The Paperclip Maximizer.’ The black planet can have a paperclip burned into it from molten lava instead of the person. And ‘Public Enemy’ can be changed to ‘AI News 81:2025/04/25.’ Try to keep the fonts, colors, and styles the same. It’s meant to be immediately recognizable as a take-off on the original image.”

As usual, I’m trying to showcase the skills of AI as opposed to hand crafting the perfect cover (which is tempting in Photoshop!)

The category images this week are a demonstration of OpenAI’s new GPT-1 image API. I tried to think of something difficult, so I had O3 create a rubric for cartoons that would be similar to The Far Side. Despite everyone thinking that OpenAI has no copyright guardrails, the system would not create anything that included the words “Gary Larson” or “The Far Side,” even when my moderation settings were set to lowest. I had to create a prompt that described the illustrative style without mentioning anything specific. I then ran my category names through the rubric, and O3 came up with prompts to generate cartoons for each category. I was pretty impressed with how well it did, considering the result was me clicking one button and getting 40 category images. Instead of giving you the top six this week, I’m sharing the top 18. They are below.

Here’s a full blog post on how GPT created the rubric and methodology to enable me to share 40 single words, hit “go” and get all of these Gary Larson inspired comics back within a few minutes.

This Week By The Numbers

Total Organized Headlines: 608

This Week’s Executive Summaries

This week is full of news that should shake up publishers and e-commerce executives. I believe we’ve crossed the Rubicon, where the quickening pace of the sum of the parts is going to start to become truly impactful. I predict this will be the last fourth quarter of normalcy online. There will most likely be a line demarcating 2025 as the year everything changed.

Google announced that its AI-enhanced search now reaches 1.5 billion users per month.

A study showed that these AI reviews reduce click-through rates by 34%.

Andrej Karpathy, one of the most generous, smartest, kindest, and broad-visioned voices in AI (and I don’t say that lightly), co-founder of OpenAI and former head of AI at Tesla, has said that our primary audience as product developers, publishers, and e-commerce website developers is no longer human. Instead, our primary audience is going to be web crawlers and agents that are looking on behalf of people who have asked them questions. That’s a completely different usability assignment than traditional user experience. I agree completely with this statement. I’m actually quite excited about this.

OpenAI has expressed interest in purchasing the Google Chrome browser, if given the chance.

The Washington Post has partnered with OpenAI to include news within ChatGPT chats.

Anthropic predicts fully automated AI employees within the next 12 months.

Along those lines, Microsoft predicts that 2025 will be the year that “frontier firms” emerge and AI employees begin to lead operational tasks.

Microsoft has added the ability to “see” to its web browser. It’s also launched computer use and web browsing within the operating system. This is a pretty big first step towards the end of the Internet as we know it.

Demis Hassabis, CEO of Google DeepMind, forecasts that AI will naturally evolve self-awareness within the next 5 to 10 years.

Anthropic launched a research feature. This means that Google, OpenAI, Perplexity, Claude, and DeepSeek all now have robust research capabilities. This is also a harbinger for the end of the Internet as we know it.

ElevenLabs, the audio AI company, has developed a method to allow voice agents to transfer calls to other voice agents. That’s a spectacular development and it implies that different specialists could take on different tasks and transfer those tasks—just like humans do now. I would say call centers are soon to be an extinct species.

One of the most important scientific endeavors will be learning how AI models think. This is currently one of the most disturbingly gray areas. The term for this discipline is called Interpretability. A company called Goodfire raised $50 million to advance tools that try to explain how models work.

The Harvard Business Review released a study that shows that people are using AI predominantly for emotional and everyday support. Therapy and companionship is the number-one personal use case of artificial intelligence.

Uber is using AI agents to deploy code across the company at an enterprise level.

Nvidia came out with three models of their own:

Describe Anything 3B is able to create highly detailed descriptions of images and videos as well as caption them. It’s especially good at diving into specific elements of an image or a video to give extra detail.

Eagle 2.5 is a small model that has been created for understanding extremely long videos and complex imagery. It’s also really good at understanding charts and graphs.

Nemotron-UltraLong-8B was created with a huge memory and supports up to 4 million tokens. That means an entire enterprise code base could be loaded into it and analyzed.

It’s important to remember that Nvidia is making models as well as chips. In particular, their robotics program and simulation training software is world-class.

OpenAI came out with a 34-page guide to explain how to make autonomous agents.

OpenAI also came out with an open-source command-line tool that allows anyone to download it and work directly with local files through a terminal interface.

Most features that we enjoy in the web or app versions of language models like ChatGPT are now being released into the API. This means that we can run them locally from our computer in large batches rather than one session at a time. That is going to create a quickening as scales explode and usage increases. This week, for example, the latest image-generation tool from OpenAI is now available in the API.

In ethics and security news, a new benchmark has shown that OpenAI’s O3 model can outperform 94 percent of expert virologists. That’s relaxing.

The Oscars has officially okayed the use of artificial intelligence. It doesn’t even require disclosure. Talk about a change of heart in less than 12 months.

Adobe has now incorporated Google and OpenAI’s models into its Firefly product. That’s another pretty big shift in ethics.

Google came out with an audio model to help people create music.

A company I had never heard of called Higgsfield launched what looks to be a very competitive video model—up there with Runway or Kling.

For the past two years, I’ve been keeping track of what are called Talking Heads, where a still image can be animated. I’m now up to 351 links. This week a cool company called Tavus came out with a really nice lipsync model called Hummingbird.

All that and much more in this week’s AI news!!!

Google’s AI-Enhanced Search Reaches 1.5 Billion Users Monthly:
Google announced that its AI-powered search summaries now reach 1.5 billion users each month, reflecting strong adoption of AI in everyday searches. Alphabet’s recent earnings report highlights a 10% year-over-year increase in search revenue, totaling $50.7 billion, despite heavy investment in AI development. For marketers, this signals a lasting shift towards AI-driven search results, meaning SEO strategies must adapt to remain effective. Google’s improved profitability also suggests it will continue expanding its AI capabilities aggressively.

Google’s AI Overviews Reach 1.5 Billion Monthly Users https://www.searchenginejournal.com/googles-ai-overviews-reach-1-5-billion-monthly-users/545333/

Google’s AI Overviews Reduce Clickthrough Rates by 34.5%
A new study analyzing 300,000 keywords shows Google’s AI Overviews significantly reduce user clicks. Data reveals that when AI-generated summaries appear in search results, the top-ranking website experiences an average clickthrough rate drop of 34.5%. Despite Google’s claims that AI Overviews drive higher engagement, independent research suggests the opposite: these summaries lead to more zero-click searches by answering user questions directly, similar to featured snippets. As AI Overviews become commonplace, clickthrough rates might decline even further, raising concerns for publishers reliant on search traffic.

AI Overviews Reduce Clicks by 34.5% https://ahrefs.com/blog/ai-overviews-reduce-clicks/

Andrej Karpathy To Everyone: Your Primary Audience Is No Longer Human
“The primary audience of your thing (product, service, library, …) is now an LLM, not a human.” Andrej Karpathy suggests a fundamental shift in how products, services, and websites are designed, favoring simplicity and accessibility for AI models (LLMs) rather than humans. He argues elaborate visual elements like animations, branding, and complex interfaces are becoming obsolete. Instead, straightforward, easily-readable text in formats like a single markdown file is essential, as AI models prefer scraping, reading, and direct access over interactive or visual navigation. According to Karpathy, it’s time for digital ergonomics to put AI first.

“PSA It’s a new era of ergonomics. The primary audience of your thing (product, service, library, …) is now an LLM, not a human. LLMs don’t like to navigate, they like to scrape. LLMs don’t like to see, they like to read. LLMs don’t like to click, they like to curl. Etc etc.” / X https://x.com/karpathy/status/1914494203696177444

“Tired: elaborate docs pages for your product/service/library with fancy color palettes, branding, animations, transitions, dark mode, … Wired: one single docs .md file and a “copy to clipboard” button.” / X https://x.com/karpathy/status/1914488029873627597

“@karpathy now API stands for AI Prompt Interface” / X https://x.com/Yuchenj_UW/status/1914495349164851457

OpenAI Eyes Google’s Chrome Browser in Antitrust Shake-Up
OpenAI executive Nick Turley testified this week that the AI firm would seriously consider buying Google’s Chrome browser if it became available. Turley told a federal court that acquiring Chrome would let OpenAI deliver a unique, AI-focused browsing experience, reshaping how users interact online. His comments came during Google’s antitrust trial, where the Justice Department seeks to break Google’s grip on online search by forcing it to sell off Chrome. OpenAI has already hinted at browser ambitions, recently hiring former Chrome developers Ben Goodger and Darin Fisher.

OpenAI would buy Google’s Chrome, exec testifies at trial | Reuters https://www.reuters.com/sustainability/boards-policy-regulation/google-contemplated-exclusive-gemini-ai-deals-with-android-makers-2025-04-22/

“o3’s search abilities are incredible. Can find extremely niche information without a ton of additional context. Just what I would say to a colleague.” / X https://x.com/natolambert/status/1913433658909720880

Washington Post Partners with OpenAI to Feature News WITHIN ChatGPT
The Washington Post has announced a new partnership with OpenAI, enabling ChatGPT to provide summaries, quotes, and direct links to the Post’s news articles.

“Can AI meaningfully help with bioweapons creation? On our new Virology Capabilities Test (VCT), frontier LLMs display the expert-level tacit knowledge needed to troubleshoot wet lab protocols. OpenAI’s o3 now outperforms 94% of expert virologists. https://x.com/DanHendrycks/status/1914696657813561799

Anthropic Predicts Fully AI Employees Within a Year
Anthropic expects virtual AI employees to begin actively participating in corporate networks within the next year, according to the company’s security leader, Jason Clinton. These virtual employees will go beyond current AI agents by having their own corporate accounts, passwords, and the ability to perform tasks autonomously, significantly increasing potential cybersecurity risks. Companies will need to rapidly update their security strategies to manage these AI identities, particularly around securing account access, monitoring AI actions, and clearly defining accountability. Clinton emphasized the urgent need to solve these security challenges, warning that unmanaged AI identities could inadvertently compromise critical systems. Major firms like Okta have already introduced products to address these new security demands.

Exclusive: fully AI employees are a year away, Anthropic warns https://www.axios.com/2025/04/22/ai-anthropic-virtual-employees-security

Microsoft Predicts 2025 as Breakout Year for AI-Driven ‘Frontier Firms’
A new Microsoft report says 2025 will mark the rise of “Frontier Firms,” companies using AI as core workers alongside humans to transform business operations and productivity. Driven by the urgent need to close the growing gap between business demands and human capacity, these firms blend AI “agents” and human workers into hybrid teams, reshaping traditional organizational charts into fluid, task-driven structures. Microsoft surveyed 31,000 workers and analyzed global data, finding 82% of leaders consider 2025 a critical year to rethink business strategies around AI, with widespread adoption anticipated within 2–5 years. Early adopters already report higher productivity, greater job satisfaction, and less anxiety over AI replacing jobs, emphasizing that companies must quickly adapt or risk falling behind.

2025: The Year the Frontier Firm Is Born https://www.microsoft.com/en-us/worklab/work-trend-index/2025-the-year-the-frontier-firm-is-born

Microsoft Expands AI Assistant Capabilities with Web Browser Vision and Computer Use
Microsoft has introduced Copilot Vision to its Edge browser, allowing the AI assistant to summarize web content aloud and actively collaborate with users in real-time browsing sessions. Additionally, the company announced a new desktop and web automation feature in Copilot Studio. Now in early-access preview, this upgrade enables AI agents to interact directly with websites and desktop applications, effectively turning graphical interfaces into practical tools. These enhancements position Copilot Studio as a versatile AI platform designed to tackle real-world business tasks efficiently.

“Microsoft also started rolling out Copilot Vision in its Edge browser It will read what’s on screen to summarize aloud, working as a real-time collaborator/assistant when browsing the internet. Best part: it’s free—and opt-in (not active by default)! https://x.com/rowancheung/status/1912744244801933726

“Microsoft added computer use to Copilot Studio, allowing users to build agents that can take actions on desktop & web It also launched Copilot Vision in Edge, giving an assistant that can see what the user is browsing (with opt-in) and help out https://x.com/adcock_brett/status/1913986926765277685

“Proud to announce computer use in Microsoft Copilot Studio! Agents can now click, type, and interact with desktop + web apps – no APIs needed. Learn more in our blog: https://x.com/clamanna/status/1912256797974622266

Anthropic CEO Calls for Global Push to Make AI Models Understandable Before They Become Too Powerful
Anthropic CEO Dario Amodei urged the AI community and governments to accelerate research into AI interpretability, the ability to understand how large language models think, before systems reach transformative levels of power, potentially by 2026 or 2027. He warned that while AI capabilities are advancing rapidly, our understanding of how these models make decisions still lags dangerously behind. Amodei detailed progress at Anthropic, including breakthroughs that allow researchers to map and manipulate internal AI features and “circuits” to trace model reasoning. But he emphasized that without broader support, interpretability may not mature in time to mitigate risks like deception, misuse, or lack of legal explainability in high-stakes sectors. He called on AI companies, academics, and governments to treat interpretability as a global priority, likening it to developing an “MRI for AI,” and proposed transparency rules and export controls to buy time for this critical safety work.

Dario Amodei — The Urgency of Interpretability https://www.darioamodei.com/post/the-urgency-of-interpretability

Demis Hassabis Predicts AI Could Gain Implicit Self-Awareness Within a Decade
Demis Hassabis, CEO of Google DeepMind, said current AI systems are not conscious but may develop a form of self-awareness “implicitly” as they advance. He believes future models will need to understand concepts like “you, self, and other” to perform more sophisticated tasks. While today’s AI lacks imagination, Hassabis forecasts that within 5 to 10 years, AI could begin generating and solving complex scientific hypotheses on its own.

“Demis Hassabis says today’s AI lacks consciousness, but self-awareness could emerge “implicitly.” Models may soon need to understand “you, self, and other” — the early elements of awareness. They lack imagination now, but in 5–10 years, Hassabis predicts they’ll solve and pose https://x.com/vitrupo/status/1914167730212855978

Anthropic adds Research and Google Workspace tools to Claude for faster, context-rich productivity
Anthropic has expanded its AI assistant Claude with two major features, Research and Google Workspace integration, designed to boost productivity and decision-making. The new Research tool allows Claude to autonomously search the web and internal content, building multi-step queries to deliver in-depth answers with citations. Meanwhile, Claude’s integration with Gmail, Google Docs, and Calendar gives it access to user emails, documents, and schedules, allowing it to summarize meetings, identify tasks, and surface relevant context without manual uploads. These additions make Claude more useful for professionals, students, and families, handling everything from launch plans to study guides with speed and precision, all while preserving data privacy through enterprise-grade safeguards.

“Anthropic added a Research feature in Claude with Google Workspace integration Research will perform searches across the web and users’ connected work data This data will also include users’ emails, calendars, and docs, thanks to the Workspace link https://x.com/rowancheung/status/1912744161226231940

“Anthropic enhanced Claude with a new ‘Research’ feature and Google Workspace integration Much like OAI’s Deep Research, Research runs searches across the web and users’ data to produce reports With Workspace, it even covers emails, calendars, and docs! https://x.com/adcock_brett/status/1913987016347263203

“New Anthropic research: AI values in the wild. We want AI models to have well-aligned values. But how do we know what values they’re expressing in real-life conversations? We studied hundreds of thousands of anonymized conversations to find out. https://x.com/AnthropicAI/status/1914333220067213529

Claude takes research to new places \ Anthropic https://www.anthropic.com/news/research

“Today we’re launching Research, alongside a new Google Workspace integration. Claude now brings together information from your work and the web. https://x.com/AnthropicAI/status/1912192384588271771

ElevenLabs adds agent-to-agent transfers for more advanced AI call routing
ElevenLabs has introduced a new feature in its Conversational AI platform that enables seamless transfers between AI agents during a call. This allows companies to build multi-agent workflows, where different AI agents, each with its own tools and knowledge, handle specific topics like billing, tech support, or availability. Transfers are triggered by defined conditions and handled via a system tool called transfer_to_agent, available through both the user interface and API. The update supports nested handoffs, encouraging more sophisticated customer support flows. ElevenLabs recommends using GPT-4o models for optimal performance.

Conversational AI now enables seamless call transfers between agents. This allows different teams within your company to develop specialized agents in parallel, each with its own knowledge base and tools. https://x.com/elevenlabsio/status/1914324212304789729

“ElevenLabs dropped Agent-to-Agent Transfers The capability allows the transfer of conversations (and their context) between specialized agents This can come in handy in cases like support ops where multiple specialized agents are active! https://x.com/rowancheung/status/1914567420846485811

Agent transfer | ElevenLabs Documentation https://elevenlabs.io/docs/conversational-ai/customization/tools/system-tools/agent-transfer

Goodfire Raises $50M Series A to Advance Tools That Explain How AI Models Work
Goodfire, a startup founded less than a year ago by former OpenAI and DeepMind researchers, has secured $50 million in Series A funding led by Menlo Ventures to accelerate research in AI interpretability. Backed by investors including Lightspeed, Anthropic, B Capital, and others, the funding will support development of Ember—a platform that reveals the inner workings of AI models by decoding neurons and making their behavior programmable. As AI systems become more complex and harder to control, Goodfire aims to help researchers and companies understand, steer, and improve these models from the inside out. The company is also partnering with frontier labs like the Arc Institute to apply its tools in domains such as biology, language models, and scientific reasoning, with upcoming research previews expected in the coming months.

“Today, we’re announcing our $50M Series A and sharing a preview of Ember – a universal neural programming platform that gives direct, programmable access to any AI model’s internal thoughts. https://x.com/GoodfireAI/status/1912929145870536935

OpenAI’s o3 and o4-mini Models Outperform Most Rivals in New Extended NYT Connections Benchmark
OpenAI’s o3 and o4-mini models posted top-tier scores on a newly expanded benchmark that uses 651 puzzles from The New York Times game “Connections,” now made harder with added decoy words. The o3 (high reasoning) model scored 79.5%, trailing only o1-pro, while o4-mini scored 74.7%. The benchmark was upgraded to challenge large language models beyond the original 436 puzzles, which had begun reaching performance ceilings—o1 previously hit 90.7% there. Human players average a 71% solve rate, but top LLMs like o1-pro are now near-perfect, prompting fresh comparisons with elite human solvers. On the newest 100 puzzles, o3 again stood out with 80%, showing consistent high performance. GPT-4o, Claude, Gemini, and Grok models lagged far behind, with many scoring below 25%, indicating a widening capability gap at the top of the LLM leaderboard.

“OpenAI’s o3 and o4-mini scores on the Extended NYT Connections benchmark This benchmark evaluates large language models (LLMs) using 651 NYT Connections puzzles, with additional words included to increase difficulty. The standard NYT Connections benchmark is nearing saturation, https://x.com/rohanpaul_ai/status/1913927366717342166

HBR Releases 2025 Update on Top 100 Gen AI Use Cases, Highlighting Shift Toward Emotional and Everyday Support
Harvard Business Review revisited its popular generative AI use case study from 2024, analyzing a year’s worth of new data from forums like Reddit and Quora. The updated 2025 report, based on 100 real-world examples, shows a dramatic shift from technical to emotional applications. “Therapy and companionship” rose to the #1 use case, with “organizing my life” and “finding purpose” entering the top five for the first time. AI is increasingly helping users manage their mental health, boost productivity, and make personal decisions—often in ways that feel more like life coaching than tech support. Broader access, lower costs, new tools like custom GPTs and voice-enabled interfaces, and expanded platforms like Google’s NotebookLM all contributed to more sophisticated, emotionally driven use. The report also surfaces growing tensions around overreliance, privacy, and ideological bias—but highlights that users are becoming more skilled and intentional in how they engage with AI tools.

“A new @HarvardBiz article shows a clear shift over the last year in how people are using AI. What started as a workplace tool is quickly becoming a daily companion. People are now turning to AI for everything from personal decision-making to thoughtful conversations, help https://x.com/yusuf_i_mehdi/status/1912995881567260964

How People Are Really Using Gen AI in 2025 https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025

Uber Deploys LangGraph-Powered AI Agents to Automate Unit Testing And Code Migrations Across Engineering Teams
Uber’s Developer Platform AI team is using LangGraph to build a network of custom AI agents that automate unit test generation, helping streamline large-scale code migrations.

“🔥 To tackle large-scale code migrations, Uber’s Developer Platform team uses LangGraph to build a network of agents and automate unit test generation. 🔥 🤖 Uber has established a dedicated Developer Platform AI team to more deeply integrate agentic systems into their https://x.com/LangChainAI/status/1915191956810207431

Nvidia releases Describe Anything 3B model for detailed, localized image and video captioning
Nvidia has launched Describe Anything 3B (DAM-3B), a new multimodal large language model designed to generate highly detailed descriptions of specific regions in images and videos. Users can highlight areas using points, boxes, scribbles, or masks to receive localized captions, with support for both still images and video frames. The system uses a focal prompt mechanism and a gated cross-attention vision backbone to combine full-scene context with fine-grained detail. DAM-3B and its video-capable version, DAM-3B-Video, are now available on Hugging Face.

“Nvidia just released Describe Anything 3B – Multimodal LLM for Detailed Localized Image and Video Captioning ⚡ > integrates full-image/ video context with fine-grained local details using a focal prompt and a localised vision backbone with gated cross-attention DAM-3B > https://x.com/reach_vb/status/1914962078571356656

Nvidia introduces Eagle 2.5, a compact AI model for long videos and complex images:
Nvidia has released Eagle 2.5, a new family of AI models built to understand long videos and detailed images better than many larger systems. One version, Eagle 2.5-8B, performs just as well as big-name models like GPT-4o—even though it’s much smaller and faster. It scored high on a wide range of tests, from video analysis to reading charts and documents. What makes it stand out is how it gets better as you give it more information, thanks to smart training methods that help it handle long and complex inputs. Nvidia also built a new video dataset, Eagle-Video-110K, filled with longer videos and smarter captions to help the model learn. Overall, Eagle 2.5 shows that small models can still do big things when trained the right way.

“Nvidia presents Eagle 2.5! – A family of frontier VLMs for long-context multimodal learning – Eagle 2.5-8B matches the results of GPT-4o and Qwen2.5-VL-72B on long-video understanding https://x.com/arankomatsuzaki/status/1914517474370052425

Nvidia Releases 4M-Token Llama 3.1 Nemotron Models and Wins Math AI Challenge
Nvidia has launched a new series of long-context language models called Nemotron-UltraLong-8B, based on Llama 3.1, capable of handling up to 4 million tokens—enough to input entire codebases at once. Designed for deep sequence processing, the models come in 1M, 2M, and 4M token variants and are trained using a blend of extended pretraining and instruction tuning to maintain performance on benchmarks despite the scale. In parallel, Nvidia also introduced OpenMath Nemotron 32B and 14B, which earned first place in the AIMO-2 math competition, outperforming DeepSeek R1 and others on AIME and HLE-Math. Nvidia has made the models, code, and datasets openly available.

“Keeps getting better: Nvidia also dropped OpenMath Nemotron 32B & 14B – secured FIRST prize in AIMO-2 competition 🤯 > beats DeepSeek R1, QwQ and more on AIME, HLE-Math and more So cool to see Nvidia not just releasing model checkpoints, but also the code and the datasets too https://x.com/reach_vb/status/1915153226145427574

“Wait, Nvidia dropped a 4 MILLION context length Llama 3.1 Nemotron 🤯 could literally drop entire codebases in it! https://x.com/reach_vb/status/1912743420851875986

OpenAI Releases Practical 34-Page Guide for Safely Building Autonomous AI Agents
OpenAI published a detailed handbook explaining how to build autonomous AI agents—systems that go beyond basic workflows by making decisions and taking actions on behalf of users. The guide lays out a clear structure: every agent needs a model (its “brain”), tools to interact with the world, and precise instructions to follow. OpenAI emphasizes choosing well-tested tools, using smart models early in development, and being explicit about the agent’s goals to avoid unpredictable behavior. It warns against rushing into multi-agent systems and stresses the need for guardrails to handle sensitive data, block risky actions, and keep humans in the loop. Designed for real-world use, the guide encourages a start-small, scale-smart approach for anyone deploying agents in complex, dynamic environments.

“5/ OpenAI just released a 34-page practical guide to building agents, thanks for sharing @Hesamation https://x.com/AtomSilverman/status/1913372919851614533

“OpenAI just released a 34-page practical guide to building agents, here’s 10 things it teaches us: https://x.com/Hesamation/status/1912916069699793006

OpenAI Releases Codex CLI for AI-Powered Terminal Coding
OpenAI released an open-source Codex CLI tool that brings AI directly into developers’ terminals for real-time coding help. Unlike a typical chatbot, Codex CLI can read and patch files, run shell commands in a sandboxed environment, and iterate on tasks based on feedback. Built with React and Ink for an interactive UI, it integrates tightly with OpenAI’s o4-mini model. Developers can prompt it to perform actions like refactoring code, reviewing file diffs, or executing commands, with safety checks and optional user approvals. Linux users get container-based sandboxing, while macOS uses built-in restrictions. Codex CLI’s workflow relies on a loop that sends prompts, gets model responses, executes approved tasks, and feeds results back for further steps, all from the command line.

“How does the OpenAI Codex CLI work? Yesterday, OpenAI released a open-sourced Codex a “chat-driven development” CLI. It allows developers to use AI models via API directly in their terminal to perform coding tasks. Unlike a simple chatbot, it can read files, write files (via https://x.com/_philschmid/status/1912870519294091726

OpenAI Releases GPT-Image-1 via API for Developers with Enhanced Controls and Higher Rate Limits
OpenAI launched its powerful image generation model, GPT-Image-1, in the API, making the same high-quality tool that went viral in ChatGPT available to developers globally. The API version includes expanded parameter control—developers can now adjust moderation sensitivity, balance image quality versus speed, and customize output formats and backgrounds. The model offers improved image accuracy, diverse styles, precise editing, and better text rendering. In addition, OpenAI doubled rate limits for o3 and o4-mini models for ChatGPT Plus users, expanding access and performance for advanced use cases.

“Image gen is now available in the API! We’re launching gpt-image-1, making ChatGPT’s powerful image generation capabilities available to developers worldwide starting today. ✅ More accurate, high fidelity images 🎨 Diverse visual styles ✏️ Precise image editing 🌎 Rich world https://x.com/OpenAIDevs/status/1915097067023900883

“More cool things about imagegen in the API—developers can control: * moderation sensitivity * image quality/generation speed * quantity of images generated * whether the background is transparent or opaque * output format (jpeg, png, webp)” / X https://x.com/kevinweil/status/1915103388993302646

“a few things are different about the api version than the chatgpt version: you can control moderation sensitivity with the ‘moderation’ parameter you can also control things like quality vs generation speed, background, output format, etc.” / X https://x.com/sama/status/1915111808983282010

“💥 We just launched ChatGPT’s imagegen model in the API! It went viral in ChatGPT, and our API offers even greater parameter control to match diverse developer use cases. Also, we doubled rate limits for o3 and o4-mini for ChatGPT Plus users. Happy Wednesday 🌞” / X https://x.com/kevinweil/status/1915103387592409215

Frontier AI Models Show Alarming Proficiency in Bioweapons-Related Tasks, Outperforming Most Virologists
A new Virology Capabilities Test (VCT) reveals that advanced language models like OpenAI’s o3 possess expert-level tacit knowledge in virology, including the ability to troubleshoot wet lab protocols—skills relevant to bioweapons development. The o3 model outperformed 94% of expert virologists, raising urgent questions about the dual-use risks of frontier AI and the need for stronger safeguards.

The Washington Post partners with OpenAI on search content – The Washington Post https://www.washingtonpost.com/pr/2025/04/22/washington-post-partners-with-openai-search-content/

Oscars OK the Use of A.I., With Caveats
The Academy of Motion Picture Arts and Sciences formally addressed artificial intelligence in its updated Oscar rules, stating that using A.I. tools won’t impact a film’s eligibility for nomination. However, the Academy emphasized a preference for films where human authorship remains central to creative decisions. While it stopped short of requiring filmmakers to disclose A.I. usage, the move signals a shift as A.I. becomes more common in Hollywood. Recent Oscar contenders like The Brutalist, Emilia Pérez, and Dune: Part Two used A.I. for enhancement, fueling debate throughout awards season. The Academy also reinforced its communications policy, banning public criticism of a film’s techniques or subject matter, including on social media.

Oscars OK the Use of A.I., With Caveats – The New York Times https://www.nytimes.com/2025/04/21/business/oscars-rules-ai.html

China Overtakes U.S. in AI Research Leadership Within 8 Years
A comparison of the top 20 AI research institutions shows a dramatic shift from 2015 to 2023. In 2015, U.S.-based organizations held 12 of the top 20 spots, but by 2023, China claimed 11—while the U.S. dropped to just 4. The UK, Japan, and Germany now hold fewer positions, and countries like France and Canada appear minimally. The shift signals China’s growing independence from U.S. academic institutions and intellectual property, reshaping global dominance in AI research.

“Look at this. Top 20 research leaders: 2015 period: 12 USA, 2 Japan, 2 UK, 2 China, 1 Canada, 1 Switzerland 2023 period: 11 China, 4 USA, 2 Germany, 1 France, 1 Japan, 1 UK did you realize that the ranking flipped in 8 years? That they *no longer depend on* US schools or “IP”? https://x.com/teortaxesTex/status/1915120490768998524

xAI Adds Memory to Grok as It Pushes Toward $25B Raise and Next-Gen Supercomputer
Elon Musk’s AI company xAI has begun beta testing a memory feature in its Grok assistant, enabling it to recall past chats and deliver personalized responses—mirroring capabilities found in ChatGPT. Users can also manually delete specific conversations using a new “forget” button. Meanwhile, xAI is reportedly seeking up to $25 billion in funding to support development of its massive Colossus 2 supercomputer, expected to house one million NVIDIA GPUs at a projected cost exceeding $35 billion. On the product front, the new Grok 3 family is now available via API, with Grok 3 Mini delivering competitive reasoning performance at a fraction of the cost, and Grok 3 positioning itself as a leading non-reasoning model for sectors like law, finance, and healthcare.

“xAI just shipped a big Grok update! The AI assistant’s voice mode now has vision capabilities, allowing users to ask about anything seen from their camera feed (much like Gemini Live) It also brings support for multilingual audio and real-time search! https://x.com/rowancheung/status/1914930045807632773

“Elon Musk’s xAI started rolling out a memory feature into its Grok assistant (in beta) Just like ChatGPT, Grok will reference past chats to provide personalized answers. There’s also a dedicated “forget” button to exclude specific chats from its memory https://x.com/rowancheung/status/1912744290779816309

Elon Musk’s xAI Reportedly Looking To Raise As Much As $25 Billion As It Continues Work On The Colossus 2 Supercomputer That Is Expected To House 1 Million NVIDIA GPUs At A Cost Of Over $35 Billion https://wccftech.com/elon-musk-xai-reportedly-looking-to-raise-as-much-as-25-billion-as-it-continues-work-on-the-colossus-2-supercomputer-that-is-expected-to-house-1-million-nvidia-gpus-at-a-cost-of-over-35-billion/

“Meet the Grok 3 family, now on our API! Grok 3 Mini outperforms reasoning models at 5x lower cost, redefining cost-efficient intelligence. Grok 3, the world’s strongest non-reasoning model, excels in tasks that need real world knowledge like law, finance, and healthcare. https://x.com/xai/status/1913308977477353582

Adobe Expands Firefly App with OpenAI and Google Models, Launches New Image, Video, and Vector Tools
Adobe announced major updates to its Firefly generative AI platform during the MAX London conference, including integration of image models from OpenAI and Google Cloud alongside its own commercially safe Firefly models. The release includes Firefly Image Model 4 and Ultra for photorealistic image generation, a production-ready Firefly Video Model, and a new Text-to-Vector feature. Adobe also introduced Firefly Boards, now in public beta, as a collaborative space for moodboarding and rapid iteration. With support from brands like PepsiCo and Accenture, Firefly’s growing ecosystem now allows creators to switch between models with full transparency and produce assets across Adobe’s suite. APIs for text-to-video, image editing, and avatar creation are also rolling out in beta as part of Firefly Services for business workflows.

Adobe adds AI models from OpenAI, Google to its Firefly app | Reuters https://www.reuters.com/business/adobe-adds-ai-models-openai-google-its-firefly-app-2025-04-24/

Baidu Activates 30,000-Chip Kunlun Cluster to Train DeepSeek-Scale AI Models
At its annual developer conference, Baidu CEO Robin Li announced that the company has powered up a cluster of 30,000 P800 Kunlun chips, built in-house, to support training large-scale AI models similar to DeepSeek. The system can handle hundreds of billions of parameters or enable thousands of customers to fine-tune smaller models simultaneously. Already adopted by Chinese banks and internet firms, the chip rollout underscores Baidu’s push into AI infrastructure. The company also introduced Ernie 4.5 Turbo and reasoning model Ernie X1 Turbo, claiming they rival leading global benchmarks. Baidu plans to integrate these AI models across its apps, emphasizing that practical applications, not just models or chips, will drive the future of AI in China’s increasingly competitive tech landscape.

Baidu launches new AI model amid mounting competition | Reuters https://www.reuters.com/world/china/baidu-launches-new-ai-model-amid-mounting-competition-2025-04-25/

AI Supercomputer Growth Accelerates: Private Sector Dominates as Costs, Power Needs, and Scale Skyrocket
The performance of top AI supercomputers is doubling roughly every 9 months, driven by more chips deployed and higher performance per chip, both increasing about 1.6x annually. Once rare, clusters with over 100,000 AI chips are now common among leading tech companies. This scale-up has made systems exponentially more expensive, with hardware costs doubling each year; xAI’s Colossus alone is estimated at $7 billion. Power demands are also doubling annually, with the most powerful system today consuming 300 megawatts, enough to power 250,000 homes. While energy efficiency is improving (1.34x/year), it’s not enough to curb total consumption. Ownership has shifted dramatically since 2019, with private companies now holding over 80% of AI compute capacity globally. If trends continue, by 2030, a leading system may require 2 million chips, cost $200 billion, and draw as much power as nine nuclear reactors. The U.S. leads in global AI supercomputer performance with 75%, followed by China at 15%, though data coverage is limited.

“Performance has grown drastically – FLOP/s of leading AI supercomputers have doubled every 9 months. Driven by: – Deploying more chips (1.6x/year) – Higher performance per chip (1.6x/year) Systems with 10,000 AI chips were rare in 2019. Now, leading companies have clusters 10x https://x.com/EpochAIResearch/status/1915098225465245931

Google Showcases Real-Time AI Agents and Launches Gemini 2.5 Flash with Cost Controls
At Google Cloud, teams demonstrated how next-gen AI agents will function using Gemini 2.0 Flash and the new Live API. These agents operate in real time, are multimodal, and can switch between video and audio calls. They access user identity, shopping carts, and can perform tasks like scheduling and searching with built-in tools, supported by human-in-the-loop systems like Salesforce. Meanwhile, Google DeepMind released Gemini 2.5 Flash—a hybrid reasoning model with adjustable “thinking budgets” to balance performance and cost. It excels in reasoning, STEM, and visual tasks, and is now live across Google AI Studio, Vertex AI, and the Gemini app. Other launches include AI Studio Apps for rapid web app development, video generation via Veo 2, and new context caching tools that cut costs by up to 75%.

“This will be the future! 🤯 At Google Cloud we showed how we think the next Generation of Agents will work, personalized real-time and Multimodal! All powered @GoogleDeepMind Gemini 2.0 Flash and the Live API. Technical Details: 🗣️ Build with Gemini 2.0 Flash LIVE API and ADK. https://x.com/_philschmid/status/1915360039570739283

“Gemini 2.5 Flash just dropped. ⚡ As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 – making it ideal for tasks like building chat apps, extracting data and more. Try an early version in @Google AI Studio → https://x.com/GoogleDeepMind/status/1912966489415557343

“We were busy over eastern, here are some updates you might have missed, we shipped: – @GoogleDeepMind Gemini 2.5 Flash, dynamic thinking and the best cost-performance Gemini Model released – Context Caching for Gemini 2.0 Flash, 2.5 Pro (2.5 Flash coming), ~75% cost reduction https://x.com/_philschmid/status/1914571503397396956

“📣 It’s here: ask Gemini about anything you see. Share your screen or camera in Gemini Live to brainstorm, troubleshoot, and more. Rolling out to Pixel 9 and Samsung Galaxy S25 devices today and available for all Advanced users on @Android in the Gemini app: https://x.com/GeminiApp/status/1909215393186472380

“Google continues to cook—this time dropping Gemini 2.5 Flash, a hybrid reasoner AI —Matches 04-mini —Strong across reasoning, STEM, and visual reasoning —Includes ‘thinking budget’ to optimize for cost —Available via Google AI Studio, Vertex, Gemini app https://x.com/rowancheung/status/1914201233822253336

Google releases desktop-friendly Gemma 3 with major memory reduction breakthrough
Google has released a new version of its 27-billion-parameter Gemma 3 model that can now run on a single consumer-grade GPU, such as the NVIDIA RTX 3090. Using a technique called Quantization-Aware Training (QAT), Google has drastically reduced the model’s memory requirements without sacrificing performance. This update makes one of the most powerful AI models to fit on a single H100 now accessible to developers using high-end desktop hardware, marking a major step toward broader use of large-scale AI models outside data centers.

“Just announced new versions of Gemma 3 – the most capable model to run just one H100 GPU – can now run on just one *desktop* GPU! Our Quantization-Aware Training (QAT) method drastically brings down memory use while maintaining high quality. Excited to make Gemma 3 even more https://x.com/sundarpichai/status/1913260423622656432

“Google also released a new version of Gemma 3, optimized with ‘Quantization-Aware Training’ This reduces the 27B model’s memory requirements, enabling it to run on consumer GPUs, like the NVIDIA RTX 3090, with maintained performance https://x.com/rowancheung/status/1914201267053777278

Perplexity to Testify in Google DOJ Antitrust Case, Advocates for Openness in Android Ecosystem
AI company Perplexity is slated to testify in the ongoing Department of Justice antitrust trial against Google, weighing in on the future of Chrome and Android. While Perplexity supports keeping Chrome under Google’s control—citing its high-quality execution and the benefits of open-sourcing Chromium, which now powers browsers like Microsoft Edge and Perplexity’s own Comet—it is calling for greater openness in the Android ecosystem. The company argues that consumers should be free to choose their default search engine and voice assistant without restrictions tied to access to the Play Store or core Google apps like Maps and YouTube.

“Perplexity has been asked to testify in the Google DOJ case. Our core points: 1. Google should not be broken up. Chrome should remain within and continue to be run by Google. Google deserves a lot of credit for open-sourcing Chromium, which powers Microsoft’s Edge and will also” / X https://x.com/AravSrinivas/status/1914373458982805888

Google DeepMind adds real-time controls and creative tools for musicians with Lyria 2 model
Google DeepMind has released a major update to its Music AI Sandbox, an experimental suite of tools designed for professional musicians. Powered by the new Lyria 2 model, the platform now enables artists to generate unique tracks, continue existing compositions, and reshape audio using text prompts and fine-tuned controls. Features like “create,” “extend,” and “edit” help users explore different moods, genres, tempos, and vocal styles with high-fidelity output. Lyria 2 delivers professional-grade audio quality, while a new component called Lyria RealTime introduces live music performance and control, offering musicians a powerful way to compose and iterate on the fly.

“*Taps mic* Is this thing on? 🎙️ We’re excited to introduce new features in Music AI Sandbox, a set of experimental tools for professional musicians. Powered by our latest model Lyria 2, they’re helping singer-songwriters like Isabella Kensington create their next masterpiece. https://x.com/GoogleDeepMind/status/1915421048171856047

“1️⃣ The create feature in Music AI Sandbox enables generation of multiple unique tracks to spark creativity. 🪄 Choose vocal style 🎤, input your lyrics ✍️, fine-tune tempo and key ♯ using the improved advanced controls. https://x.com/GoogleDeepMind/status/1915421050768081285

Higgsfield Launches Turbo Video Model with Faster Renders, Lower Costs, and New Motion Styles
Higgsfield has released *Turbo*, its fastest AI video model to date, offering 1.5x faster rendering speeds and a 30% cost reduction. The update includes priority queue access and seven new motion styles—ARC, JIB, DOUBLE DOLLY, FACE PUNCH, and STATIC among them—designed to support quicker iteration for creative professionals.

“Introducing Turbo – our fastest model yet. 1.5x faster renders. 30% cheaper. Priority queue. Perfect for rapid iteration. Oh, and we’re dropping 7 brand-new motion styles too. Pros are already using it. 🧩 1/n https://x.com/higgsfield_ai/status/1915154497426309121

Tavus launches Hummingbird-0, a high-accuracy zero-shot lipsync model outperforming competitors
AI video platform Tavus has released Hummingbird-0, its most precise zero-shot lipsync model to date, now available in research preview via @FAL. Originally developed as part of the Phoenix-3 full-face renderer project, Hummingbird stood out for its speed, realism, and accuracy, prompting a standalone launch. Independent tests show it surpasses both open and closed source models in lipsync precision, identity preservation, and overall realism—at a lower cost. Tavus’s service lets users sync new audio to existing videos using a simple API, with support for MP4 and MP3/WAV formats under five minutes in length, ideal for creating personalized or professional-grade content.

“We just dropped a new SoTA lipsync model on @FAL: Hummingbird-0 Available now as a research preview, it’s the most accurate zero-shot lipsync model we’ve tested, open or closed source. https://x.com/heytavus/status/1915435703833641231

2 AI Visuals and Charts: Week Ending April 25, 2025

“Even book a ride to the airport with multi-app actions. https://x.com/perplexity_ai/status/1915064591668895999

“T1 showing off its soccer moves – a four-foot-tall robot from Chinese company Booster Robotics. https://x.com/TheHumanoidHub/status/1914721286653370492

Top 66 Links of The Week – Organized by Category

AGI

“Demis Hassabis appeared on 60 Minutes, sharing new insights into AI: —AGI in 5-10 years, with conscious AI likely in the future —AI fast-tracking drug discovery, potentially eradicating all disease within the next decade —AI ensuring radical abundance https://x.com/rowancheung/status/1914567398885147089

“i think this is gonna be more like the renaissance than the industrial revolution” / X https://x.com/sama/status/1913320105804730518

“A key point about working with AI. It is a weirdly hard technology to understand intuitively & it is easy to believe it can’t do something when it can with more trials or effort. Thus, evidence that AI can do a task with reasonable consistency beat demos showing it can’t.” / X https://x.com/emollick/status/1913282310159360033

“Interesting how much specifically instructions are given for making games. AI labs are optimizing for viral use cases. https://x.com/emollick/status/1913758861409812801

“It seems like a big question in AI right now is whether there is a flywheel in AI development & when it kicks in, either because having advanced AIs let you code & train better AIs, or due to other returns to experience If/when it does, all non-frontier labs fall behind forever.” / X https://x.com/emollick/status/1913357569034272892

“This keeps coming up, but, just in case you were wondering, being polite seems to have no effect on answer quality in aggregate. It greatly increases the quality of particular answers while greatly lowering the quality of others, and it is not possible to know which in advance. https://x.com/emollick/status/1912677033731039635

“More evidence that o3 represents a big move forward, this time on ARC-AGI. https://x.com/emollick/status/1914798775840706690

ARVR

Road to VR on X: “Report: Apple CEO “cares about nothing else” Than Building Breakout AR Glasses Before Meta See more 👉https://t.co/YAf6q53ySS https://t.co/RO1g8n0t2t” / X

“when your geometry sucks but you have RTX 5090 graphics” / X https://x.com/bilawalsidhu/status/1914438972639506453

“Alibaba just announced Uni3C on Hugging Face Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation https://x.com/_akhaliq/status/1914619143925432338

AgentsCopilots

“Now that AI driven search works pretty well using o3, the implications are pretty big… https://x.com/emollick/status/1914366069160206482

“Lovable just reached $40M ARR in 5 months. But more importantly: We’ve now helped 1M+ people build their idea. This is why non-technical people use lovable: //1” / X https://x.com/antonosika/status/1912923281012994553

“A potential issue with o3 is that it thinks it is using tools even when it does not, leading to some hallucinations where it assumes work that was implied in the reasoning chain was actually done. You should double check the reasoning trace for complex work to see what it did.” / X https://x.com/emollick/status/1912889596125274283

Deck raises $12M to ‘Plaid-ify’ any website using AI | TechCrunch https://techcrunch.com/2025/04/16/deck-raises-12m-to-plaid-ify-any-website-using-ai/

“”A practical guide to building agents” from @OpenAI It’s a guide for product and engineering teams building their first AI agents, covering: – Use case selection – Agent design patterns – Best practices for safe, reliable deployment – And other foundation you need to start https://x.com/TheTuringPost/status/1913002164475351212

“Introducing Perplexity iOS Voice Assistant Voice Assistant uses web browsing and multi-app actions to book reservations, send emails and calendar invites, play media, and more—all from the Perplexity iOS app. Update your app in the App Store and start asking today. https://x.com/perplexity_ai/status/1915064472391336071

“Perplexity Voice Assistant can search for and play podcasts, YouTube videos, and other media. https://x.com/perplexity_ai/status/1915064511771578621

Amazon

“This is a great example! Building Agents with Amazon Nova Act and MCP 👉 How to build intelligent web automation agents using Amazon Nova Act integrated with Model Context Protocol (MCP) https://x.com/danilop/status/1911773475032764798

Anthropic

“INCREDIBLE!! An MCP server to browse the web like humans! Stagehand by @browserbasehq allows Agents to independently control a browser. Integrating its MCP server with Claude delivers a more reliable alternative to OpenAI Operator. 100% open-source with 10k stars! https://x.com/_avichawla/status/1912030666130333906

Audio

“It takes less time for AI to generate a song than for a person to listen to it. It seems like a good standard for “AI is above human level in music” to be whether or not someone is willing to listen only to an all AI-generated channel. I don’t think anyone is doing that, though?” / X https://x.com/emollick/status/1914029182885265741

“California-based Play AI released an AI voice changer The tool allows users to change their voice into ANYONE else’s with just 10 seconds of audio Accessible on mobile, it even preserves the same voice and tone of the original recording! https://x.com/rowancheung/status/1914567423916638560

“Spotify just announced ViSMaP on Hugging Face Unsupervised Hour-long Video Summarisation by Meta-Prompting https://x.com/_akhaliq/status/1915703054701044209

BusinessAI

“We’ve been collaborating closely with developers to understand where image gen can be most useful in the real world. Here are some examples from early adopters across domains like creative tools, consumer apps, enterprise software, and more below. 👇” / X https://x.com/OpenAIDevs/status/1915097072107143322

How People Are Really Using Gen AI in 2025 https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025

Business guides and resources | OpenAI https://openai.com/business/guides-and-resources/

OpenAI Forecasts Revenue Topping $125 Billion in 2029 as Agents, New Products Gain — The Information https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain

Famed AI researcher launches controversial startup to replace all human workers everywhere | TechCrunch https://techcrunch.com/2025/04/19/famed-ai-researcher-launches-controversial-startup-to-replace-all-human-workers-everywhere/

Introducing Embed 4: Multimodal search for business https://cohere.com/blog/embed-4

ChipsHardware

“How quickly are AI supercomputers scaling, where are they, and who owns them? Our new dataset covers 500+ of the largest AI supercomputers (aka GPU clusters or AI data centers) over the last six years. Here is what we found🧵 https://x.com/EpochAIResearch/status/1915098223082873015

EducationAI

Advancing Artificial Intelligence Education for American Youth – The White House https://www.whitehouse.gov/presidential-actions/2025/04/advancing-artificial-intelligence-education-for-american-youth/

EthicsLegalSecurity

Overview – C2PA An open technical standard providing publishers, creators, and consumers the ability to trace the origin of different types of media. https://c2pa.org/

UAE set to use AI to write laws in world first https://archive.md/ubZAW

“How much electricity do you burn with each request to your chatbot? Now you know. Not to induce guilt, but to empower smarter, more sustainable conversations.” / X https://x.com/fdaudens/status/1915025543696716234

Generative AI is learning to spy for the US military | MIT Technology Review https://www.technologyreview.com/2025/04/11/1114914/generative-ai-is-learning-to-spy-for-the-us-military/

“3/ One former operator who visited an AI data center with us quickly sketched out a $30k attack that could knock the entire $2B+ facility offline for over 6 months. Vulnerabilities like this exist up and down the stack.” / X https://x.com/harris_edouard/status/1914676824799432812

“Power requirements are following a similar trajectory, doubling every year. Today’s most powerful system requires 300 MW—equivalent to about 250,000 households. https://x.com/EpochAIResearch/status/1915098229860880462

“Very tight control over AI “memory” is really critical to productive AI use. If you don’t know what is in the context window, the AI responses can be contaminated in ways that make it more sycophantic or biased, or lead to sandbagging. It also can make it hard to share prompts.” / X https://x.com/emollick/status/1915171354552553607

“”wow 0.06% per book, so with just 1667 books we should get 100%!” You’re either: (a) poor at stats (b) never ran experiments (c) intentionally obtuse/just memeing. I’ll give you the benefit of the doubt and assume it’s (c). Think about it: what experiment needs to be conducted https://x.com/giffmana/status/1914245144422776906

Wikipedia is giving AI developers its data to fend off bot scrapers | The Verge https://www.theverge.com/news/650467/wikipedia-kaggle-partnership-ai-dataset-machine-learning

Google

Achieve real-time interaction: Build with the Live API – Google Developers Blog https://developers.googleblog.com/en/achieve-real-time-interaction-build-with-the-live-api/

“We just shipped Tier 3 limits in the Gemini API (2x-6x higher limits), available now for self serve upgrade if you have spent >=$1K with Google Cloud. You can upgrade in AI Studio (on the API key page) to Tier 3 and start scaling with Gemini right now! https://x.com/OfficialLoganK/status/1915119791506915812

Imagery

“New foundation model on image and video captioning just dropped by @NVIDIAAI 🔥 Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references 😮 The team released the models, the dataset, a new benchmark and a demo 🤩 https://x.com/mervenoyann/status/1914980803055862176

“Adobe announced DRAGON on Hugging Face Distributional Rewards Optimize Diffusion Generative Models https://x.com/_akhaliq/status/1914602497148154226

“Don’t sleep on this! 🔥 @Meta dropped swiss army knives for vision with A2.0 license ❤️ > image/video encoders for vision language and spatial understanding (object detection etc) > VLM outperforms InternVL3 and Qwen2.5VL 🔥 > Gigantic video and image datasets 👏 https://x.com/mervenoyann/status/1915723394701467909

Adobe Firefly: The next evolution of creative AI is here | Adobe Blog https://blog.adobe.com/en/publish/2025/04/24/adobe-firefly-next-evolution-creative-ai-is-here

Mobile

Perplexity: Announcing Our Global Partnership with Motorola https://www.perplexity.ai/hub/blog/announcing-our-global-partnership-with-motorola

Multimodality

“Adversarial patches fool computer vision systems, and retraining models for defense is impractical. This paper introduces a training-free Visual RAG (VRAG) framework using Vision-Language Models (VLMs) to detect these patches by retrieving similar known attacks from a database. https://x.com/rohanpaul_ai/status/1914942566828589225

“Cohere released Embed 4, a SOTA multimodal embedding model to add frontier search and retrieval capabilities to AI apps —128K-token context window —Supports 100+ languages —Optimized for data from regulated industries —Up to 83% savings on storage costs https://x.com/rowancheung/status/1912744190947057969

“LiveCC just dropped on Hugging Face Learning Video LLM with Streaming Speech Transcription at Scale video LLM capable of real-time commentary, trained with a novel video-ASR streaming method, SOTA on both streaming and offline benchmarks. https://x.com/_akhaliq/status/1915094398364197101

OpenAI

“OpenAI’s new o3 model hallucinates significantly more than older releases! The findings come from third-party testing, where the AI was caught lying about actions it never took! OpenAI’s system card also says o3 seems to hallucinate >2x more than o1 https://x.com/rowancheung/status/1914201289988227506

“Turns out, LLMs represent numbers on a helix and use trigonometry to do addition. A new paper reverse engineers addition in models like GPT-J-6B and finds a “Clock” algorithm. Numbers are encoded using sine and cosine terms, then added like angles. https://x.com/LiorOnAI/status/1914334179929530660

“killer story from @srimuppidi: openai has upped its projections even more from last fall. key drivers include agents and “free user monetization.” that second driver could refer to channels including advertising or affiliate fees: https://x.com/steph_palazzolo/status/1915095524677505049

“Deep Research (lightweight version) is now available in the ChatGPT free tier:” / X https://x.com/gdb/status/1915637620731941188

“I’m hearing more and more stories of ChatGPT helping people fix longstanding health issues. We still have a long way to go, but shows how AI is already improving people’s lives in meaningful ways. https://x.com/gdb/status/1914106403574452496

Robotics

“Kuafu robot from Leju Robotics completed a 5K practice run in preparation for the Beijing Half Marathon on Saturday. https://x.com/TheHumanoidHub/status/1912911510248235361

“Unitree Robotics has opened a new 107,000-square-foot factory in Hangzhou, just 15 minutes from its headquarters in eastern Zhejiang province. The facility is expected to support the company’s expansion over the next three to five years. https://x.com/TheHumanoidHub/status/1914950738070983134

“Modern AI models don’t need a strict set of rules to work together — they can figure it out themselves ▪️ Hogwild! inference by @yandexcom is an interesting approach to AI teamwork. It lets multiple instances of the same LLM run in parallel. This is done using shared Key-Value https://x.com/TheTuringPost/status/1913729366976332153

Stumbling and Overheating, Most Humanoid Robots Fail to Finish Half-Marathon in Beijing | WIRED https://www.wired.com/story/beijing-half-marathon-humanoid-robots/

“Apptronik CEO Jeff Cardenas says elderly care may be the biggest opportunity for humanoids, given the aging population and labor shortage. “My dream is that my parents will have a robot that helps take care of them so they can age with dignity.” https://x.com/TheHumanoidHub/status/1915184047267226026

“Tiangong robot won the Beijing half marathon with a time of 2 hours 40 minutes – more than twice the time of the men’s race winner. The robot is built by the Beijing Humanoid Robot Innovation Center. https://x.com/TheHumanoidHub/status/1913521470397112555

ScienceMedicine

Johnson & Johnson Pivots Its AI Strategy – WSJ https://archive.md/EzUQD

“We built an AI model to simulate how a fruit fly walks, flies and behaves – in partnership with @HHMIJanelia. 🪰 Our computerized insect replicates realistic motion, and can even use its eyes to control its actions. Here’s how we developed it – and what it means for science. 🧵 https://x.com/GoogleDeepMind/status/1915077085325922785

AI assisted search-based research actually works now https://simonwillison.net/2025/Apr/21/ai-assisted-search/

“For the first time, I truly believe AI might replace doctors. I felt dizzy every morning when standing up from bed, and even worse after moving to a hill in SF, the dizziness hit me every time I climbed. Visited my primary care doctor twice; both times, they said it’s normal,” / X https://x.com/Yuchenj_UW/status/1914000352606818419

TechPapers

“Dynamic Early Exit in Reasoning Models – Allows LLMs to self-truncate CoT sequences by dynamic early exit – Reduces the CoT length by ~35% while improving accuracy by 1% – 10%. https://x.com/arankomatsuzaki/status/1914889033085542537

Video

“More evidence that scaling data diversity across tasks, robots and modalities can unlock real-world generalization. Physical Intelligence’s π0.5 model enables robots to autonomously perform tasks in unseen homes, handling both high level reasoning and low level control. https://x.com/TheHumanoidHub/status/1914743233743347883

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading