About This Week’s Covers

This week’s cover is dedicated to the six-year anniversary of the seminal tech essay “The Bitter Lesson,” written by Rich Sutton on March 13, 2019. The Bitter Lesson in AI research is that training models simply by using more computing power and large amounts of data outperforms trying to incorporate human knowledge or adding subject matter guidance, despite our intuitions to the contrary. The “bitter lesson” is that power and data are all the computer needs to beat a human. The less guidance, the better. It’s a blow to our egos. The main cover depicts embodied robots running a lemonade stand to serve humanity their bitter lesson.

This week’s category covers all feature bitter lemon themes in homage to the bitter lesson. Created automatically using prompts written by Claude and Ideogram.

This Week By The Numbers

Total Organized Headlines: 511

This Week’s Executive Summaries

Last week was another busy week in AI news, with 15 stories meriting executive summaries.

OpenAI is pushing the pace of agent adoption (AIs that can take action on your behalf) with the addition of commands that can help developers build agents that can search the web, look through local files, and even control your computer. Expect a lot of products to be released in the next six months. In fact, OpenAI is so confident in its agents that it has now increased the potential monthly fee to $20,000 per month for PhD-level research agents.

Legal AI provider, Harvey, has built agents that are outperforming humans at law firms.

Another breakthrough product comes out of China called Manus, with building hype coming off of the previous week’s announcement. This is a product that is built on top of Anthropic’s Claude that has 29 additional agent tools, like browser control. It’s continued evidence that building the AI model is not necessarily the endgame but rather building products on top of it.

Infamous Taiwanese manufacturer Foxconn has launched an AI model that is built to improve manufacturing and supply chain computing. Bank of America came out with a report that says humanoid robots are on their way.

Google released a small model that can be run on a single graphics chip or a phone that outperforms the best models of last year. And they’re giving it away for free. Google also launched two impressive AI models that improve robot embodiment in the real world. Google Gemini can now integrate with your search history to give personalized responses. Google Gemini can now ingest and process YouTube videos, including understanding the context, plot lines, and transcription.

Meta announced it is going to build its own in-house computing chips for artificial intelligence.

The Pentagon signed a deal to use AI agents for military planning.

Ad agency juggernaut WPP is partnering with Stability AI for marketing and entertainment production.

TechCrunch has promised to maintain an AI glossary to help lay people keep track of all the terms in the industry.

In addition to the 15 executive summaries, be sure to check out the 13 top visuals of the week. These speak louder than words. Finally, you can find the top 38 of the 511 headlines from last week. All of these are hand-sorted and chosen by me, so hopefully I’ve done the work and you don’t have to. Have a great week!

OpenAI Quickens The Pace of Change with New Agents API
OpenAI introduced the Responses API, making it easier to build AI agents. The tool combines the simplicity of chat with the capabilities of assistants. Developers can now add web search, file search, and computer control with just a few lines of code. The system also handles image analysis for tasks like scanning receipts and finding objects in photos. The sum is going to be greater than the parts as thousands of people start building small agents to handle specific tasks.

“Our new API primitive: the Responses API. Combining the simplicity of Chat Completions with the tool-use of Assistants, this new foundation provides more flexibility in building agents. Web search, file search, or computer use are a couple lines of code! https://x.com/OpenAIDevs/status/1899531367056064814

“this is a tremendous accomplishment from the team. i am obviously biased but i think it’s one of the most well-designed and useful APIs ever, and that people will really love it.” / X https://x.com/sama/status/1899579431905305027

“OpenAI’s Agents SDK is Transforming AI Agent Development” / X https://x.com/AtomSilverman/status/1899862633341477221

“We’re launching new tools to help developers build reliable and powerful AI agents. 🤖🔧 Timestamps: 01:54 Web search 02:41 File search 03:22 Computer use 04:07 Responses API 10:17 Agents SDK https://x.com/OpenAIDevs/status/1899531225468969240

“🤖 Agents SDK—our new open-source SDK for orchestrating multi-agent workflows, improving upon Swarm. Configure agents with built-in tools, hand off tasks, add safety guardrails, and visualize execution traces for debugging and optimizing performance. https://x.com/OpenAIDevs/status/1899531857143972051

“🤖 OpenAI Responses API Yesterday OpenAI released a new Responses API, including built-in tools and management of conversation state. You can now build with these features in LangChain! Try it out with `pip install langchain-openai==0.3.9rc1` Docs in 🧵 https://x.com/LangChainAI/status/1899888134793683243

“OpenAI announces new Responses API for easily building AI Agents 1. Responses API A single API that unifies Chat Completions and tool usage. Build agents with multiple “turns” in one request. Built-In Tools: – Web Search Tool: Get timely answers with clear citations. Uses a https://x.com/scaling01/status/1899510452473790537

“OpenAI released new DIY tools for building custom AI agents, including: A CUA model-powered Responses API with tools for web browsing, computer use, and file management An open-source Agents SDK to orchestrate single and multi-agent systems https://x.com/rowancheung/status/1899713252637991092

“Introducing the Responses API: the new primitive of the OpenAI API. It is the culmination of 2 years of learnings designing the OpenAI API, and the foundation of our next chapter of building agents. 🧵Here’s the story of how we designed it: https://x.com/athyuttamre/status/1899541471532867821

(3) ⚡️The new OpenAI Agents Platform – Latent.Space https://www.latent.space/p/openai-agents-platform

“OpenAI just launched the Agents SDK – a simple yet powerful toolkit for building AI apps that can actually do things in the real world! I summarized everything you need to know about it and how it is going to be a game changer for anyone building agents. https://x.com/AtomSilverman/status/1899511053601698073

“NEW: OpenAI announces new tools for building agents. Here is everything you need to know: https://x.com/omarsar0/status/1899530784832459043

“🔎 Web search—integrate with a few lines of code and your agent can get up-to-date answers from the web (with citations). Available in preview: https://x.com/OpenAIDevs/status/1899531516448768103

OpenAI Plans Premium AI Agents with Monthly Fees up to $20,000
OpenAI is planning a tiered system of specialized AI agents targeting different professional levels. Every month the price seems to go higher. Last I saw was $2,000/month. Now, we’re up to $20,000 per month. According to leaked information, the company will offer three tiers: business professional agents ($2,000/month), advanced developer agents ($10,000/month), and PhD-level research agents ($20,000/month). This follows OpenAI’s existing Operator agent, which costs $200/month. The substantial pricing suggests OpenAI expects these agents to deliver significant value to corporate customers.

“OpenAI is reportedly planning specialized AI agents for tasks like Ph.D.-level research Three agent tiers expected: —Business professionals ($2k/mo) —Advanced devs ($10k/mo) —PhD-level researchers ($20k/mo) OAI charges $200/mo for its Operator agent: https://x.com/rowancheung/status/1897554278576685425

“The prices for OpenAI’s upcoming agents have been leaked. They range from $2000-20,000 per month. A corporation will only pay these prices if it receives a corresponding value from the agents. And I trust OpenAI to deliver. In any case, these costs indicate that the upcoming https://x.com/kimmonismus/status/1897321124687413581

Legal AI Agents Are Outperforming Humans On Grunt Work At Law Firms
Harvey has introduced Workflows, a new agent system that guides lawyers through complex legal tasks with human-level accuracy. In blind reviews, legal professionals rated work produced by these AI assistants as equal to or better than human-created content. The platform uses “reasoning models” that can plan steps, adapt based on results, and interact meaningfully with users. Workflows are tailored to specific practice areas, guiding users through processes while showing their reasoning. Initial evaluations show Harvey performs exceptionally well at structured drafting, analysis, and data extraction tasks, with lawyer evaluators often preferring Harvey’s more detailed and specific outputs in unstructured tasks.

“Introducing Harvey Agents: https://x.com/harvey__ai/status/1899491666429632907

“Harvey released Workflows AI agents for legal tasks, with reasoning, planning, and adapting capabilities In blind reviews, lawyer evaluators rated legal work produced by workflow agents as equal to or better than that of human lawyers https://x.com/rowancheung/status/1899713342484173043

Manus AI Research Agent Continues Hype Cycle Into Second Week
Manus, an AI agent gaining popularity in China, is an Anthropic’s Claude Sonnet wrapper (aka Manus doesn’t make the core model, but extends its capabilities). Manus achieved the top score on the GAIA benchmark, surpassing OpenAI’s Deep Research. According to Manus, the system provides users with isolated sandbox environments and access to 29 tools, including browser control through the open-source @browser_use. Users communicate directly with the executor agent rather than planners or other components. Industry observers note this demonstrates how companies can create powerful AI products by integrating existing foundation models with appropriate tooling rather than building models from scratch. Manus combines capabilities similar to Deep Research, Operator, and Claude Computer, handling tasks from social media analysis to financial transactions and research simultaneously. If you note the @ sign use for communications, you’ll see the similarity to OpenAI’s new API as well a Grok’s Twitter integrations using @ as a handle for agency (also like Discord, etc).

“Finally had a chance to try Manus. It’s a Claude wrapper, but a very clever one. Runs into the same issues as general agents, including getting stuck, but also capable of some good stuff. eg “get me the 10k for apple and visualize it in different ways to show me trends& details” https://x.com/emollick/status/1899148219335983207

“MANUS AI: HYPE VS. REALITY 🔍 @peakji (co-founder of @ManusAI_HQ) confirmed rumors: ✅ Built on Anthropic Claude Sonnet, not their own foundation model ✅Has access to 29 tools and uses @browser_use open-source for browser control ✅User communicates with executor agent and not https://x.com/_philschmid/status/1899046957860979178

“I think China’s second DeepSeek moment is here. This AI agent called ‘Manus’ is going crazy viral in China right now. Probably only a matter of time until it hits the US. It’s like Deep Research + Operator + Claude Computer combined, and it’s REALLY good. https://x.com/rowancheung/status/1898093008601395380

“I tested Manus AI. It’s the closest thing I’ve experienced to a truly autonomous AI agent. I can’t wait till this thing can use desktop apps like Premiere and Photoshop. It low key feels like baby AGI. Here’s my 10 minute review: https://x.com/bilawalsidhu/status/1898945929970843842

“The popular AI agent “Manus” launched in China is automating about 50 tasks, and the scenario is too dystopian. It’s said to be more accurate than DeepSeek. It can simultaneously perform SNS analysis, financial transactions, research, purchasing, and more. https://x.com/thinking_panda/status/1897951585990590469

Foxconn Launches First AI Language Model for Manufacturing
Taiwan’s Foxconn has introduced “FoxBrain,” its first large language model, aimed at improving manufacturing and supply chain operations. Built using Nvidia’s computing hardware and based on Meta’s Llama 3.1 architecture, the model was trained in just four weeks (!). FoxBrain is optimized for traditional Chinese and Taiwanese language styles with reasoning capabilities, putting its performance close to world-class standards. The iPhone assembler plans to expand the model’s applications through partnerships and open-source sharing, with Nvidia providing technical support through its Taiwan-based “Taipei-1” supercomputer. More details will be announced at Nvidia’s upcoming developer conference.

“iPhone manufacturer Foxconn announced FoxBrain, its first LLM with advanced reasoning —Developed in 4 weeks using Nvidia’s tech and support —Optimized for traditional Chinese —Performance near top models —Will be used in manufacturing and supply chain https://x.com/rowancheung/status/1899350947718926670

Google Open Sources World’s Best On-Device AI Model – Gemma 3 (this should really be the top story)
Yesterday’s “best model” is today’s free open source tiny model that can run on your phone. Google DeepMind has launched Gemma 3, a family of open models built from the same technology powering Gemini 2.0. These models are designed to run directly on devices, allowing developers to create AI applications that work wherever users need them. Gemma 3 comes in sizes ranging from 1B to 27B parameters and features a 128K token context window with support for over 140 languages. The largest 27B model includes vision capabilities, offers commercial use under a permissive license, and can run on a single H100 GPU. Its memory-efficient architecture makes it suitable for consumer devices like laptops. According to Google’s research paper, the models were trained on Google TPU chips using reinforcement learning to enhance capabilities in various domains, with the 27B and 12B versions trained on 14 trillion tokens.

“Google released Gemma 3 – their new Open-Source multimodal model family Here is everything you need to know in one thread: Gemma 3 27B currently ranks 9th in the LMSLOP arena beating models like o1-mini and o3-mini, DeepSeek-V3, Claude 3.7 Sonnet and Qwen2.5-Max In comparison https://x.com/scaling01/status/1899792217352331446

“You can now start building with Gemma 3. 🛠️ Made from the same tech powering Gemini 2.0, these are our state-of-the-art open models designed to run fast and directly on devices — helping developers create AI applications wherever people need them. → https://x.com/GoogleDeepMind/status/1900549631647367268

“Google joins the smol models club with Gemma3 1B! Here’s a timeline showing the acceleration of smol (1B-2B) model releases over the past 18 months. This space is heating up fast! 🔥 https://x.com/LoubnaBenAllal1/status/1899873487231345062

“still trying to digest this 🤯 Gemma 3 is the biggest thing that happened in AI since DeepSeek R1 release https://x.com/mervenoyann/status/1899879621396750801

“Gemma 3 27B’s Intelligence vs. Size positioning is compelling compared to other smaller, open weights models We have completed our independent intelligence evaluations of Gemma 3 27B and have benchmarked an Artificial Analysis Intelligence Index of 38. While Gemma 3 27B is not https://x.com/ArtificialAnlys/status/1900579291404046696

“🎉 Congrats to @GoogleDeepMind on Gemma-3-27B, the newest and one of the strongest open models in Arena! 💠 Top 10 overall – beating out many proprietary models with only 27B parameter 💠 2nd best open model only below DeepSeek-R1 💠 128K context window Check out their blog to https://x.com/lmarena_ai/status/1899729292617277501

“Gemma 3 is available in a range of sizes, from 1B to 27B – and comes with a 128K token context window as well as support for over 140 languages. https://x.com/GoogleDeepMind/status/1900549635267014878

Gemma 3: Google’s new open model based on Gemini 2.0 https://blog.google/technology/developers/gemma-3/

“Gemma 3 is best in class for a VLM that runs on 1 GPU. Should make RL fine tuning feasible. Also Academic researchers can apply for Google Cloud credits (worth $10,000 per award) to accelerate their Gemma 3-based research.” / X https://x.com/sirbayes/status/1900520172059815986

“I’m so happy to announce Gemma 3 is out! 🚀 🌏Understands over 140 languages 👀Multimodal with image and video input 🤯LMArena score of 1338! 📏Context window of 128k Available in AI Studio, Hugging Face, Ollama, Vertex, and your favorite OS tools 🚀Download it today! https://x.com/osanseviero/status/1899726995170210254

“Gemma 3 is here and its the best open non-reasoning model on LMSYS! 🚀 @GoogleDeepMind Gemma 3 is an open, multimodal (text + vision), multilingual LLM with a context of 128k tokens and comes in 4 sizes! TL;DR: 4️⃣ Four sizes with 1B, 4B, 12B, 27B as pre-trained and https://x.com/_philschmid/status/1899726907022963089

“Some thoughts about Gemma 3. The tech report (as with most labs) is not really detailed but still provides some interesting info https://x.com/nrehiew_/status/1899882552946532498

“Gemma 3 can understand videos, and it’s more powerful than you think it is ⏯️ I put together a short notebook on interleaving frames and doing video inference 📖 you’re welcome 🤝 https://x.com/mervenoyann/status/1899823530524447133

“Google is BACK!! Welcome Gemma3 – 27B, 12B, 4B & 1B – 128K context, multimodal AND multilingual! 🔥 Evals: > On MMLU-Pro, Gemma 3-27B-IT scores 67.5, close to Gemini 1.5 Pro (75.8) > Gemma 3-27B-IT achieves an Elo score of 133 in the Chatbot Arena, outperforming larger LLaMA 3 https://x.com/reach_vb/status/1899728796586025282

“Today we launched Gemma 3, our most advanced and portable open models yet. This collection of lightweight models is designed to run fast, directly on devices like smartphones and laptops, to help devs create responsible AI apps at scale. Learn more ↓ https://x.com/Google/status/1899916049002217855

“Gemma3 technical report detailed analysis 💎 1) Architecture choices: > No more softcaping, replace by QK-Norm > Both Pre AND Post Norm > Wider MLP than Qwen2.5, ~ same depth > SWA with 5:1 and 1024 (very small and cool ablation on the paper!) > No MLA to save KV cache, SWA do https://x.com/eliebakouch/status/1899790607993741603

It’s “Bank of America Official”: Humanoid Robots Are Coming!
Bank of America forecasts individual humanoid robot costs dropping to $35K by late 2025 and $17K by 2030, potentially driving annual sales to 1 million units by 2030 (I think this is low). The report projects 3 billion robots in operation by 2060 (a brave 30-year prediction), with 65% in homes, 32% in service sectors, and 3% in industry. Nvidia, Tesla, and Meta are positioned as likely major beneficiaries of this expanding market. I would add Google to this list personally.

“New report by Bank of America analysts: “The era of humanoid robots is coming.” ⦿ The cost of a humanoid robot estimated to be $35K by the end of 2025 and $17K by 2030. ⦿ Global annual sales could reach 1 million units by 2030, and the number of humanoid robots in operation https://x.com/TheHumanoidHub/status/1899135995398029741

Google DeepMind Launches AI Models to Improve Robot Intelligence
Speaking of Google robots, hot on the heels of Figure launching their Helix multimodal robot embodiment model, here comes the OG. Google DeepMind has introduced two AI models designed to help robots better understand and interact with the physical world. Gemini Robotics, built on the Gemini 2.0 platform, combines vision, language, and action capabilities to handle complex tasks like folding origami and packing lunch boxes, even when facing situations it wasn’t specifically trained for. The second model, Gemini Robotics-ER, specializes in spatial reasoning to improve robot control, allowing robots to identify interactive parts of objects (like a mug handle) while avoiding unsafe contact points. According to DeepMind, their models more than double performance on generalization benchmarks compared to existing technologies. The company is partnering with Apptronik to develop humanoid robots, with Boston Dynamics, Agility Robots, and Enchanted Tools serving as early testers.

“Google DeepMind introduced two foundational models for embodied reasoning, enabling robots to comprehend, react, and take action in the physical world: ⦿ Gemini Robotics – built on Gemini 2.0. Integrates vision, language, and action for real-world dexterity, . ⦿ Gemini https://x.com/TheHumanoidHub/status/1899875342221009265

“Robots must be able to interact seamlessly with humans. 🤝 When it’s interrupted or situations change, Gemini Robotics can adjust its actions on the fly. This level of steerability will empower us to better work with future robot assistants in the home, at work and beyond. https://x.com/GoogleDeepMind/status/1899839632772067355

“We’re partnering with @Apptronik to build the next generation of humanoid robots with Gemini 2.0 – and opening our Gemini Robotics-ER model to trusted testers such as Agile Robots, @AgilityRobotics, @BostonDynamics and @EnchantedTools. Find out more → https://x.com/GoogleDeepMind/status/1899839644302270671

“Meet Gemini Robotics: our latest AI models designed for a new generation of helpful robots. 🤖 Based on Gemini 2.0, they bring capabilities such as better reasoning, interactivity, dexterity and generalization into the physical world. 🧵 https://x.com/GoogleDeepMind/status/1899839624068907335

“They also accomplished tasks not seen in training, showing the ability to generalize to new scenarios. 💡 We show that on average, Gemini Robotics more than doubles performance on a comprehensive generalization benchmark – compared to other state-of-the-art https://x.com/GoogleDeepMind/status/1899839635720663463

Google’s Gemini AI Can Leverage Your Search History for Personalized Responses – More Google App Integration Coming Soon
Google launched personalization for its Gemini AI assistant, integrating across Google apps, but starting with Search. For the first step, personalization allows Gemini to access user Search history to provide more relevant answers. Powered by the Gemini 2.0 Flash Thinking model, the integration analyzes whether your past searches can improve responses to questions about vacation ideas, hobbies, or content creation. Users maintain control through clear permission settings and can disconnect access at any time. The feature is available now for Gemini subscribers on web (with mobile coming soon) in over 45 languages. Google plans to expand personalization to include Photos and YouTube data in coming months, while soon extending the ability to reference past conversations to all Gemini users.

Introducing Gemini with personalization https://blog.google/products/gemini/gemini-personalization/

Meta Announces Custom In-house Computing Chip and AI-Specific Infrastructure Upgrades
Last week Meta announced major infrastructure developments designed specifically for AI computing needs. The company introduced its first custom silicon chip (MTIA) for running AI models, unveiled an AI-optimized data center design with liquid cooling capabilities, and expanded its Research SuperCluster to include 16,000 GPUs. These infrastructure upgrades will support Meta’s growing AI compute demands across its apps serving over three billion users. The company emphasizes that controlling their entire technology stack allows for targeted customization, which they believe will be crucial as AI continues advancing over the next decade.

“Meta is testing a new, in-house chip to cut costs on AI training Manufactured by TSMC, the chip is part of the company’s MTIA series and is likely to be deployed in 2026 It will help Meta cut reliance on Nvidia’s pricey GPUs for training large models https://about.fb.com/news/2023/05/metas-infrastructure-for-ai/

Pentagon Signs Deal To Use AI Agents for Military Planning
The Pentagon has signed a contract with Scale AI to integrate artificial intelligence agents into military decision-making and operations planning, marking the most significant AI deployment in Western defense to date. The Defense Innovation Unit’s Thunderforge project will use AI to simulate war scenarios, support mission planning, and analyze information for faster strategic decisions. The system will initially deploy to US Indo-Pacific and European Commands before expanding to all 11 combatant commands. Scale AI will lead the implementation team alongside Anduril and Microsoft, with humans remaining the final decision-makers despite AI assistance. The move comes amid ongoing debates about AI in warfare, with tech workers at Google and Microsoft facing termination for protesting similar defense contracts. Defense officials state the system includes safety protocols and transparency features to ensure users can trace the AI’s reasoning process.

Pentagon to give AI agents a role in planning, operations • The Register https://www.theregister.com/2025/03/05/dod_taps_scale_to_bring/

Marketing Agency Giant WPP Invests in Stability AI for Marketing and Entertainment Production
Agency behemoth WPP has formed a strategic partnership with Stability AI to advance content creation across multiple media formats. The collaboration gives WPP access to Stability AI’s open visual models for image, video, 3D, and audio production, integrated directly into WPP’s AI operating system. Through a joint R&D pipeline, the companies will develop new approaches for creative ideation and concept testing. “This collaboration is unique in its focus on the visual media industry,” said WPP’s CTO Stephan Pretorius. The partnership adds Stability AI to WPP’s £300m annual AI investment portfolio, joining other notable Stability AI backers including Greycroft, Coatue Management, and former Facebook president Sean Parker, who serves as Executive Chairman.

Stability AI Announces Investment from WPP and New Partnership to Shape the Future of Media and Entertainment Production — Stability AI https://stability.ai/news/stability-ai-announces-investment-from-wpp-and-new-partnership-to-shape-the-future-of-media-and-entertainment-production

Google AI Studio Now Supports YouTube Video Processing, Transciption and Detailed Contextual Analysis With Gemini Models
Google AI Studio has added YouTube video link support to the Gemini API, allowing developers to process videos directly through a simple URL. Gemini can analyze videos up to 90 minutes long, extracting information, providing descriptions, and answering questions about video content. The system processes both visual frames and audio tracks, sampling video at one frame per second. While Gemini Pro can handle videos up to 2 hours with its 2M context window, users are limited to uploading 8 hours of public YouTube video per day, with only one video permitted per request.

“Introducing YouTube video 🎥 link support in Google AI Studio and the Gemini API. You can now directly pass in a YouTube video and the model can usage its native video understanding capabilities to use that, with just a link! 🚢 https://x.com/OfficialLoganK/status/1899914266062577722

Reinforcement Learning Pioneers Win Turing Award, Computing’s Highest Honor
Andrew Barto and Richard Sutton received the 2024 ACM A.M. Turing Award for developing reinforcement learning (RL), a fundamental approach that powers many modern AI systems. The researchers, who began their work in the 1980s, established the mathematical foundations and key algorithms that enable AI systems to learn from experience rather than explicit programming. Their techniques were behind major AI breakthroughs including AlphaGo’s victory over human champions and ChatGPT’s ability to understand human preferences. The $1 million prize recognizes their lasting impact—their textbook has been cited over 75,000 times and RL continues driving advances in robotics, network optimization, chip design, and even our understanding of the human brain’s dopamine system.

“Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! https://x.com/TheOfficialACM/status/1897225672935735579

The TechCrunch AI Glossary
TechCrunch has compiled a glossary that defines key technical terms used in artificial intelligence reporting. The resource aims to help readers better understand complex AI concepts that scientists and researchers commonly use in their work. TechCrunch plans to update the glossary regularly as AI technology advances and new terminology emerges, particularly as researchers develop innovative methods and identify potential safety concerns.

“📊 Introducing the Agentic LLM Leaderboard: A new way to compare how different LLMs perform when powering agents! See how models stack up on various benchmarks, with clear comparisons between vanilla and agentic setups. Built with smolagents framework 🤖 Kudos @AymericRoucher https://x.com/fdaudens/status/1899251429518270813

13 AI Visuals and Charts: Week Ending March 14, 2025

“I have mixed feelings about the phrase “vibe coding” but I really feel the appeal of conjuring something to life with words. Veo2: “sweeping brutalist arch looming over a suburban town as a man & woman, their arms completely covered with flowers, bike through the flooded streets” https://x.com/emollick/status/1897841941296927096

“⚙️ It goes head to head with our team to wrap a timing belt around gears – a feat that’s harder than you think ↓ https://x.com/GoogleDeepMind/status/1899839630242955536

Google DeepMind on X: “They also accomplished tasks not seen in training, showing the ability to generalize to new scenarios. 💡 We show that on average, Gemini Robotics more than doubles performance on a comprehensive generalization benchmark – compared to other state-of-the-art” https://x.com/GoogleDeepMind/status/1899839635720663463

Google DeepMind on X: “Our model Gemini Robotics-ER allows roboticists to tap into the embodied reasoning of Gemini. 🌐 For example, if a robot came across a coffee mug, it could detect it, use ‘pointing’ to recognize parts it could interact with – like the handle – and recognize objects to avoid” / X https://x.com/GoogleDeepMind/status/1899839638493077892

“next. fucking. level https://x.com/multimodalart/status/1899887369811009802

“We made an agent that can generate @3blue1brown style videos about *anything*. We’ve also released a benchmark for animated educational videos so we can systematically improve this tech! Some notes below:” / X https://x.com/KrishRShah/status/1897062578347757940

AI Video Tool Showdown: Comparing Google Veo 2 And OpenAI Sora in 2025 https://www.forbes.com/sites/moinroberts-islam/2025/03/06/2025s-ai-video-showdown-comparing-google-veo-2-and-openai-sora/

“Picking objects up from the floor will be an essential skill for general-purpose humanoid robots. Atlas can do it, and among all bipedal robots, OG Atlas is the only other one I can recall demonstrating this capability. https://x.com/TheHumanoidHub/status/1897752029906583606

“We’re designing Atlas to do anything and everything, but we get there one step at a time. See why we started with part sequencing, how we are solving hard problems, and how we’re delivering a humanoid robot with real value. https://x.com/BostonDynamics/status/1897298172210225280

“AGIBOT’s Lingxi X2 robot can ride a scooter, a hoverboard, and a self-driving bike. https://x.com/TheHumanoidHub/status/1899382998258364747

“Monocular full body pose estimation has gotten really good. You can now do 3D motion capture with one video feed. https://x.com/bilawalsidhu/status/1899659187371528595

“Playing guitar, reskinned with Runway’s restyle feature — pretty epic for digital character replacement. I’m genuinely impressed by how well the fretting & strumming hands hold up. Not perfect yet, but pulling this off would basically be impossible with Viggle or even Wonder https://x.com/bilawalsidhu/status/1897868306872209825

“We taught our robot to trail drive and it nails it zero-shot 🤯 Its week 1 at our new test facility in the Santa Cruz mountains. Our vehicle has never seen this trail before, in fact it has been trained on very little trail driving data to date. Watch it navigate this terrain https://x.com/adcock_colby/status/1899486992548913455