AI News #75: Week Ending March 07, 2025 with Executive Summary, Top 36 Links, and Helpful Visuals

March 8, 2025

About This Week’s Covers

This week’s cover is dedicated to the memory of Gene Hackman and is inspired by the classic film ‘The Conversation’.

This week’s category covers are all 1970’s movie poster style themes in homage to The Conversation. Created automatically using prompts written by Claude and Ideogram.

This Week By The Numbers

Total Organized Headlines: 586

This Week’s Executive Summaries

There were no huge headlines this week, but there were many important announcements resulting in 17 executive summaries below! I think the biggest one is going to be a new product called Manus, which came right at the tail end of the week, and I think we’ll have a lot more news in next week’s edition. OpenAI plans to charge $20,000 a month for super powerful AI agents. While my favorite model is still Claude 3.7 Sonnet, I do like Perplexity’s DeepResearch feature, and the fact that they are integrating it into enterprise document search is exciting. There have been big developments in optical character recognition as well as speech to text. Alibaba released a strong model, and OpenAI released a very expensive, computing-intensive model that apparently improves a subtle conversational vibe check to serve as a foundation for their future models.

I added dedicated categories for DeepSeek, Figure, Llama, Mistral, NVIDIA, and Qwen. Previously, I folded them under Open Source, Robotics, and Chips and Hardware.

There’s a lot in here this week, and I hope you enjoy it.

TSMC Expands U.S. Chip Manufacturing with $100 Billion Investment
Large chip companies like Nvidia and Apple do not always (rarely?) manufacture their own chips. In fact, one semiconductor company, Taiwan Semiconductor Manufacturing Co. (TSMC), dominates the chip manufacturing industry. That’s why this news is the number one story for the week. If you want to learn more about TSMC, I recommend listening to Lex Fridman’s podcast (linked below) at the 1:31 mark. TSMC plans to invest an additional $100 billion in U.S. operations, bringing its total U.S. investment to $165 billion. The expansion includes three new chip manufacturing plants and two packaging facilities in Arizona. This follows TSMC’s Arizona factory already producing 4-nanometer chips. While the Biden administration passed the $280 billion CHIPS Act to boost domestic chip production after pandemic-related shortages, Trump has favored using tariff threats on imported chips instead of tax incentives. Important to note, TSMC’s investment comes amid ongoing tensions between China and Taiwan, with TSMC’s U.S. expansion potentially providing some technology security should complications arise with Taiwan.

Trump to make investment announcement as he meets with TSMC CEO | AP News https://apnews.com/article/trump-tsmc-chip-manufacturing-tariffs-42980704ffca62e823182422ee4b7b83

DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459 https://youtu.be/_1f-o0nqpEI?si=ZRBSShbvbJKB5FtB&t=5466

Manus is the ‘world’s first general agent’ – another DeepSeek moment?
I have a hunch this one is going be a major story next week. It broke a little bit late this week. “Manus is a general AI agent that bridges minds and actions: it doesn’t just think, it delivers results. Manus excels at various tasks in work and life, getting everything done while you rest.” It’s worth checking out the website, watching the video and looking at the examples. Stay tuned for more information next week. Hype alert for sure.

Manus https://manus.im/

OpenAI Plans Premium AI “Agents” with Monthly Fees Up to $20,000
If you look at the headline right above this one, you’ll see Manus as a general agent. And if you remember just a few weeks ago, OpenAI’s research agent got blown out of the water by DeekSeek and Perplexity. So put on your seatbelt and look at the price tag on OpenAI’s new agent. I’ve seen this number floating around for several weeks, but the fact that it’s not going away is worth noting. OpenAI is developing specialized AI “agents” with subscription costs ranging from $2,000 to $20,000 monthly, according to The Information. The product lineup will include agents for sales lead management, software engineering, and PhD-level research assistance. While launch timing and customer eligibility remain unannounced, SoftBank, an OpenAI investor, has reportedly committed $3 billion to these agent products this year. The high pricing structure comes as OpenAI seeks new revenue streams after reporting approximately $5 billion in losses last year.

OpenAI reportedly plans to charge up to $20,000 a month for specialized AI ‘agents’ | TechCrunch https://techcrunch.com/2025/03/05/openai-reportedly-plans-to-charge-up-to-20000-a-month-for-specialized-ai-agents/

Perplexity Launches Enterprise File Integration
Keep an eye on Perplexity! Perplexity has connected its AI research platform to Google Drive, OneDrive, and SharePoint, allowing companies to search across both internal documents and web content simultaneously. The feature provides enterprise-grade security while enabling employees to analyze company files and receive answers drawn directly from their organization’s documents. According to Perplexity, companies using their Enterprise Pro plan report saving more than 10 hours per employee per week.

“Introducing Perplexity Deep Research for Enterprise Data. Perplexity now connects to Google Drive, OneDrive, and SharePoint, enabling deep research across company files and the web with enterprise-grade security and compliance. Available now for Enterprise Pro Users. https://x.com/perplexity_ai/status/1895554562104443073

Alibaba’s QwQ-32B Shows Impressive Math Skills Despite Small Size
While the big companies get most of the attention, it’s important to continually track the secondary players, especially in the open source market. You never know when another DeepSeek will arrive. For the past two years, Mistral and Qwen have been two of the main open models. It’s important to mention that Qwen is owned by Alibaba and not every large AI model is owned by a forward-facing AI company. ByteDance is another, and of course Meta and Amazon. This week Alibaba launched QwQ-32B, an open weights reasoning model showing strong early benchmark results. On the AIME 2024 math competition, it scored 78%, outperforming DeepSeek R1, while achieving 59.5% on GPQA Diamond, slightly behind Gemini 2.0 Flash’s 62%. What makes these results notable is QwQ-32B’s efficiency, it operates with just 32 billion parameters compared to DeepSeek R1’s 671 billion. However, the models use different precision formats (BF16 vs. FP8), affecting their storage requirements and performance characteristics on specialized hardware. More comprehensive benchmark results are expected soon.

“Alibaba launches QwQ-32B, an open weights reasoning model that may approach DeepSeek R1’s level of intelligence We’ve been running evals on it all night and we’ve only gotten our scores back for GPQA Diamond and AIME 2024 thus far: ➤ GPQA Diamond: 59.5%, placing QwQ materially https://x.com/ArtificialAnlys/status/1897701015803380112

“Qwen 32B QwQ – no. 1 trending on Hugging Face – SoTA after SoTA, the competition is heating up! 🔥 GG @Alibaba_Qwen https://x.com/reach_vb/status/1897974348503208081

OpenAI Launches GPT-4.5 with Focus on “Human-Like” Interactions
Does it feel like every single model is always the “best” on every single leaderboard? It sure does feel that way to me this week. There are so many models, with each one winning in their own individual category. In the case of OpenAI’s new model, 4.5 I think they are moving more towards a DeepSeek approach of automated learning without human reinforcement. The big takeaway for most of the OpenAI employees seems to be that the model is more conversational. It’s not meant to be a multi-modal or a reasoning model yet. CEO Sam Altman describes it as “the first model that feels like talking to a thoughtful person,” noting its improved ability to provide genuinely helpful advice. While GPT-4.5 outperforms previous models on coding benchmarks like SWE-Lancer, it doesn’t significantly advance capabilities in math or science compared to reasoning-focused models like o1. Former OpenAI researcher Andrej Karpathy explains that GPT-4.5 represents a 10x increase in computing power over GPT-4, bringing subtle but meaningful improvements in creativity, understanding nuance, and reducing hallucinations. The model is initially available only to Pro users due to GPU shortages, with Plus tier access expected next week as OpenAI adds tens of thousands more GPUs.

“BREAKING News: @OpenAI’s GPT-4.5 now tops the Arena leaderboard! With over 3k votes, GPT-4.5 landed #1 across ALL categories, and singularly #1 under Style Control / Multi-Turn 🥇 Huge congratulations to @OpenAI on this impressive milestone! 🙌 View below for more insights on https://x.com/lmarena_ai/status/1896590146465579105

“GPT-4.5 topped all categories across the board, with a clear leadership in Multi-Turn. 🥇 Multi-Turn 💠 Hard Prompts 💠 Coding 💠 Math 💠 Creative Writing 💠 Instruction Following 💠 Longer Query https://x.com/lmarena_ai/status/1896590150718922829

“GPT 4.5 + interactive comparison 🙂 Today marks the release of GPT4.5 by OpenAI. I’ve been looking forward to this for ~2 years, ever since GPT4 was released, because this release offers a qualitative measurement of the slope of improvement you get out of scaling pretraining” / X https://x.com/karpathy/status/1895213020982472863

“GPT-4.5 is ready! good news: it is the first model that feels like talking to a thoughtful person to me. i have had several moments where i’ve sat back in my chair and been astonished at getting actually good advice from an AI. bad news: it is a giant, expensive model. we” / X https://x.com/sama/status/1895203654103351462

“Breaking: OpenAI just released GPT-4.5, the startup’s largest AI model to date. Available now to Pro ($200/mo tier) users and developers on paid tiers via API. Everything else you need to know about the highly-anticipated launch: https://x.com/rowancheung/status/1895202496907546718

Dan Hendrycks and Eric Schmidt Warn U.S. and China Have Entered An AI Security Standoff
Dan Hendrycks and Eric Schmidt warn that China’s DeepSeek R1 model marks a critical turning point in AI competition, comparing the stakes to nuclear weapons development. Their new paper introduces “Mutual Assured AI Malfunction” (MAIM), a deterrence framework where nations threaten sabotage against rivals approaching superintelligence. The two argue the U.S. must strengthen cyberattack capabilities, implement stricter AI chip export controls, and reduce dependence on Taiwan for computing power to maintain strategic advantage.

The Nuclear-Level Risk of Superintelligent AI | TIME https://time.com/7265056/nuclear-level-risk-of-superintelligent-ai/

Amazon Developing AI Reasoning Model Under Nova Brand
Alibaba can’t have all the “retail does AI” news this week with Qwen. And just as Microsoft invests in OpenAI yet makes its own models, Amazon appears to be investing in Anthropic but building its own as well. Amazon is working on an AI model with advanced “reasoning” capabilities, potentially launching in June under its Nova brand. The model would take a step-by-step approach to solving problems, improving reliability for math and science tasks. According to Business Insider, Amazon plans to use a “hybrid” architecture similar to Anthropic’s Claude 3.7 Sonnet, allowing both quick answers and more complex thinking in one system. While Amazon reportedly aims to make its model more price-efficient than competitors, this could prove challenging as companies with open weight models like DeepSeek are driving extremely affordable AI models within weeks of closed models. Also, June feels like years away.

Amazon is reportedly developing its own AI ‘reasoning’ model | TechCrunch https://techcrunch.com/2025/03/04/amazon-is-reportedly-developing-its-own-ai-reasoning-model/

Apple may be preparing Gemini integration in Apple Intelligence
Apple has introduced code referencing “Google” as a third-party model choice for Apple Intelligence, according to a backend change spotted in Apple firmware code, following the release of the first iOS 18.4 beta last week.

Apple may be preparing Gemini integration in Apple Intelligence. | The Verge https://www.theverge.com/news/618087/apple-could-be-preparing-to-add

Larry Page launches AI manufacturing startup
Google co-founder Larry Page is building a company called Dynatomics that applies AI to product manufacturing. The startup, led by former Kittyhawk CTO Chris Anderson, aims to create AI systems that design optimized objects and automate their production. While Page brings significant resources to this venture, he joins several companies already working in the AI manufacturing space, including Orbital Materials (materials discovery), PhysicsX (engineering simulations), and Instrumental (factory anomaly detection).

Google co-founder Larry Page reportedly has a new AI startup | TechCrunch https://techcrunch.com/2025/03/06/google-co-founder-larry-page-reportedly-has-a-new-ai-startup/

Andrej Karpathy Tutorial Offers Practical Guide to Using Large Language Models in Everyday Life
AI researcher Andrej Karpathy has published a comprehensive 2-hour YouTube video that explores his everyday use of large language models. Karpathy walks viewers through the LLM ecosystem with concrete examples from his personal usage, covering interactions with models like ChatGPT, Cursor, and Claude, their various capabilities including tool usage, file handling, and media processing. He demonstrates how these systems can perform internet searches, conduct research, analyze data, generate visualizations, process audio, and work with images and video. The video also explores customization options such as ChatGPT’s memory features and the creation of specialized GPTs. His videos are usually considered better than MIT college courses while also being accessible.

“New 2h11m YouTube video: How I Use LLMs This video continues my general audience series. The last one focused on how LLMs are trained, so I wanted to follow up with a more practical guide of the entire LLM ecosystem, including lots of examples of use in my own life. Chapters https://x.com/karpathy/status/1895242932095209667

Inception Labs Unveils Mercury: The First Diffusion-Based Language Model
This is notable because it’s the first of its kind (to my knowledge). Inception Labs has introduced Mercury, a diffusion large language model (dLLM) that generates text all at once rather than sequentially. The company claims Mercury matches the performance of models like GPT-4o Mini and Claude 3.5 Haiku while operating up to 10x faster, achieving over 1000 tokens per second on NVIDIA H100s without specialized hardware. Former OpenAI researcher Andrej Karpathy noted this represents a significant departure from conventional autoregressive LLMs, comparing it to the diffusion approach already common in image and video generation. Karpathy suggests Mercury may exhibit unique characteristics and encourages people to test it to discover potential new strengths and weaknesses.

“We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation. https://x.com/InceptionAILabs/status/1894847919624462794

“This is interesting as a first large diffusion-based LLM. Most of the LLMs you’ve been seeing are ~clones as far as the core modeling approach goes. They’re all trained “autoregressively”, i.e. predicting tokens from left to right. Diffusion is different – it doesn’t go left to” / X https://x.com/karpathy/status/1894923254864978091

AI financial analyst Endex emerges from stealth with OpenAI partnership
Endex and OpenAI unveiled their collaboration on an AI platform that processes complex financial data like a human analyst. The tool retrieves, analyzes, and identifies discrepancies in financial information, claiming to outperform traditional search-based AI systems. The platform can handle earnings summaries, transaction overviews, and due diligence tasks, delivering results as emails, documents, or slide decks. Tests show financial experts preferred Endex’s AI responses 70% of the time over competing systems. Unlike standard AI that simply retrieves information, Endex uses OpenAI’s reasoning models (including o1 and o3-mini) to think through financial data the way experienced analysts do, allowing investment professionals to focus on decision-making rather than data verification.

Announcing Endex: An AI Financial Analyst Today we’re coming out of stealth and announcing our partnership with OpenAI https://openai.com/index/endex/

Microsoft’s Phi-4 Multimodal Takes Top Spot in Speech Recognition
Microsoft’s Phi-4 Multimodal model has claimed first place on the Hugging Face Speech Recognition Leaderboard with a 6.14% Word Error Rate (WER), outperforming competitors like NVIDIA Canary and OpenAI Whisper. Released under MIT license, the model goes beyond basic transcription to handle speech summarization and speaker identification (diarization). It also functions as an Audio Language Model, making it versatile for multiple audio processing tasks.

“🚀 Phi-4 just dropped & it’s now #1 on the Open ASR Leaderboard! 🏆🔥 But you can make it even better! 🎯 Fine-tune it for your specific use case—whether it’s handling noisy audio, improving accuracy in low-resource languages, or custom domain adaptation! Try it out with this” / X https://x.com/Tu7uruu/status/1895161283743490548

“🔥 Phi-4-multimodal-instruct just landed on the Hugging Face Speech Recognition Leaderboard — and it’s ranked #1 with an impressive 6.14% WER! Big moves in ASR — check out how it compares to Whisper, Canary & more 👀 https://x.com/Tu7uruu/status/1896948226743558530

“BOOM! Phi 4 Multimodal (MIT licensed) – the new king of the Open ASR Leaderboard 💥 Beats Nvidia Canary, OpenAI Whisper and more 🤩 Bonus: the model can do much more – speech summarisation, diarization and doubles up as an Audio LM too! https://x.com/reach_vb/status/1897014754943910266

Sanctuary AI’s Robot Hands Approach Human-Like Touch Sensitivity
Sanctuary AI’s Phoenix robot has advanced finger sensors that nearly rival human touch capabilities. Each finger pad contains a seven-cell touch sensor using micro-barometers, similar to what’s found in smartphones. These sensors can detect pressure as slight as 5 millinewtons, approaching the 3 millinewton sensitivity threshold of human fingers. This marks a big step toward robots that can interact with delicate objects using tactile feedback.

“Each finger pad on Sanctuary AI’s Phoenix robot features a seven-cell touch sensor with micro-barometers, like those made affordable in smartphones. These sensors are sensitive to 5 millinewtons (mN) – getting close to the 3 mN sensitivity of human touch. https://x.com/TheHumanoidHub/status/1896627478116057300

Sesame Launches Voice AI That Rivals Human Interaction
Sesame launched its Conversational Speech Model (CSM), aiming to address the “emotional flatness” problem in current voice assistants. The system is designed to create conversations that convey emotional nuance through tone, timing, and contextual awareness. Unlike standard text-to-speech systems, CSM processes both text and audio using a two-transformer architecture that maintains conversational context to generate more natural-sounding responses. Testing shows CSM achieves near-human performance in basic speech quality, with listeners unable to distinguish between CSM-generated and human speech in non-contextual evaluations. However, when evaluating appropriateness within conversation flows, human speech still outperforms the AI.

“At Sesame, we believe in a future where computers are lifelike. Today we are unveiling an early glimpse of our expressive voice technology, highlighting our focus on lifelike interactions and our vision for all-day wearable voice companions. https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice

“The new AI voice from Sesame really is a powerful illustration of where AI is going. This is all real-time, from my browser. Excellent use of disfluencies, pauses, even intakes of breathe really make this seem like a human, though bits of uncanniness remain, at least for now. https://x.com/emollick/status/1896757383566950466

Adam Silverman Predicts UI Design Will Shift From Human to Agent Experience
Adam Silverman predicts that traditional UI/UX design will be replaced by AX (Agent Experience) design. Unlike humans, AI agents don’t require visually appealing interfaces; they simply need minimal functional frameworks to complete tasks with complete accuracy. This shift suggests future digital interfaces may prioritize machine-readable efficiency over human aesthetic preferences.

“UI/UX is a thing of the past. Next big thing will be AX. AgentExperience. Agents don’t need beautiful interfaces to interact with. They just need HTML as a way to get tasks done 100% accurately. https://x.com/AtomSilverman/status/1894936154216243712