In an effort to match that simplicity, I gave a screenshot to Gemini and said, “Change the text to ‘AI News 104: 2025/09/26,’” and used the first result.

AI News #104: Week Ending September 26, 2025 with 48 Executive Summaries and 8 Additional Stories

September 27, 2025

About This Week’s Covers

This week’s main cover celebrates my newsletter’s second birthday, with a nod to The Office episode where Dwight decorates for Kelly’s birthday using a simple sign that reads, “It is your birthday.”

In an effort to match that simplicity, I gave a screenshot to Gemini and said, “Change the text to ‘AI News 104: 2025/09/26,’” and used the first result.

The rest of the covers were created with my new image-generation rubric, where a short Python script asks me three questions about the theme, Claude Sonnet 4.5 writes the prompts, and Gemini 2.5 creates the images.

The Python script determined this week’s theme is: “Celebrating two years and 104 consecutive weeks of hand-curating AI newsletter links with pride and renewed energy.”

I’ve included my favorite six covers below:

This Week By The Numbers

Total Organized Headlines: 527

This Week’s Executive Summaries

This week’s newsletter marks my 104th week of publishing AI updates. Two years!

There’s quite a bit of news this week (perhaps a new record), and I’m excited to share it.

We’ll start with two major Google stories, then move into a substantial update on chips, hardware, and data centers, followed by ethics and alignment news. From there, we’ll shift to the future of the internet itself as browser use becomes more integrated with AI systems. After that, we’ll look at the latest model releases and then talk about open source. Before we move over to robotics, we’ll cover some business updates, augmented and virtual reality, and then close things out with video updates.

Two Major Google Stories

The top story in imagery news is Google Gemini 2.5. Google’s image tool generated 5 billion images in less than one month… a number that’s almost impossible to comprehend. This is directly tied to the strength and popularity of Google’s groundbreaking image creation tool, Nano Banana, now incorporated into Gemini 2.5. That’s 161,290,322 images a day, 6,720,430 an hour, 112,007 per minute, and 1,866 per second.

Not to be outdone in anything this week, Google DeepMind released a profound project called Video Models Are Zero-Shot Learners and Reasoners. Think about that for a second… Google is saying that video models are not just text-to-video systems, but that they can learn and reason.

DeepMind Veo3 is a generative video model that, like large language models for text, is aiming to become a general-purpose vision foundation model.

Instead of training separate models for each vision task, Veo3 can handle them all. For example: understanding perception, modeling environments, training a robot to manipulate an object, or even reasoning visually… understanding how to navigate a maze by looking at it.

Google’s Veo3 can detect object edges, segment individual objects, and contextually understand and transform them within a scene.

It understands physical properties of materials… like flammability.

It knows the principles of gravity in different environments (Earth vs. the Moon). It understands buoyancy.

It can play Jenga… with objects of any type.

It can pack an object into a container. It understands optics like glass or mirrors. It can mix colors, subtract colors, and transfer styles from one object to another. It can in-paint objects into scenes and manipulate text inside an environment. It can even edit video using doodles drawn on top.

The list is almost unbelievable.

It continues past these “provincial” concepts into near IQ-test-level reasoning: solving Sudoku, predicting how water would flow through a variety of pipes, sorting numbers, playing Tetris, solving analogy problems. It’s absolutely mind-blowing and worth reading the paper.

The idea of combining such a powerful vision model with a text interface is one of those huge leapfrog moments that’s almost hard to wrap your head around.

Data Centers and Hardware

The big news in data centers is a follow-up to a July 2025 agreement where OpenAI and Oracle signed a deal to build up to 4.5 gigawatts of artificial intelligence computing infrastructure to accelerate the January 2025 Stargate roadmap. It hasn’t been a year yet, and things are moving quicker than initially planned.

This week, Oracle and OpenAI announced they will add five new U.S. Stargate data center sites and expand the project beyond the original agreement. With these five new sites, alongside the existing Abilene, Texas location, Stargate will have seven gigawatts of planned capacity.

Stargate currently represents over $400 billion in investment planned over the next three years with a goal of ten gigawatts of infrastructure.

The new Stargate sites will be in Texas, New Mexico, Ohio, with a yet-to-be-announced Midwest location. Three of the sites will be developed with Oracle and two with SoftBank. The project is expected to generate a combined 25,000 on-site jobs and thousands more across supporting industries. The Abilene flagship location is already operational on the Oracle Cloud and is supplying computing (I refuse to say “compute” too often) for large-scale model training and inference.

This week, NVIDIA announced it intends to invest up to $100 billion in OpenAI.

Under a bit of scrutiny, this can look a like a circular investment between NVIDIA, Oracle, and OpenAI: OpenAI has committed $300 billion to Oracle for cloud computing. Since Oracle runs on NVIDIA GPUs, Oracle is going to buy billions in chips from NVIDIA. NVIDIA announced it’s investing $100 billion into OpenAI.

As Selly Omar points out on Twitter, OpenAI is going to use NVIDIA’s money to pay Oracle, who’s paying NVIDIA, who’s then reinvesting it in OpenAI.

Not to be left out, Elon Musk tweeted “Just as we will be the first to bring a Gigawatt of coherent training compute online, we will also be the first to 10GW, 100GW, 1TW,…”

Ethics and Alignment

Moving on to Ethics and Alignment news, OpenAI released a very strong new benchmark that measures AI’s ability to perform real-world, economically viable tasks.

GDPval reflects authentic work deliverables that professionals produce in their jobs: documents, spreadsheets, slide decks, reports, CAD drawings, audio and video assignments and more.

The benchmark evaluates tasks from 44 different occupations across the nine economic sectors that contribute the most to U.S. GDP. In total, the benchmark includes 1,320 tasks, along with a listing of the most important 220 tasks called the Gold Subset.

OpenAI is offering a public automated grading service that allows anyone to test their models across these tasks.

According to the initial findings, frontier AI models are progressively getting very good at these real-world tasks, and AI performance is improving linearly over time.

For many of the benchmarks, the best models are approaching the quality of experiences human professionals (as judged anonymously by industry experts). Notably AI can complete these tasks much faster and more cheaply… sometimes as much as 100x faster and 100x cheaper.

One noteworthy detail: OpenAI was completely transparent that Anthropic’s Claude is outperforming OpenAI’s own model on the benchmarks. This is the third time this year that OpenAI has specifically acknowledged that Claude is better in a benchmark. In this case, Opus 4.1 beats GPT-5 High on GDPval. OpenAI is getting praise across the industry for being proactively candid with the results.

The absolute worst idea of the week is a new feature from Meta called Vibes, which is a newsfeed for AI-generated videos being rolled out inside the Meta AI app. It’s pretty self-explanatory… a giant pile of slop that people can then crosspost to Instagram and Facebook. I’m pretty sure this is why we can’t have nice things.

Two weeks ago, OpenAI published a report on how people are using ChatGPT, but I forgot to include it in my summary. I think it merits inclusion this week.

The study is the largest analysis to date of consumer ChatGPT usage, based on a privacy-preserving sample of 1.5 million conversations collected over the last few years. It looks at global usage across demographics from GPT consumer plans.

Usage has moved well beyond early adopter groups and has expanded widely. The gender gap in particular has narrowed, with users appearing to be 52% female (based on first names). Lower-income and middle-income regions have been growing incredibly quickly, with adoption in the lowest-income countries growing 4x as quickly as highest-income countries.

The majority of conversations fall into three major categories: practical guidance, seeking information, and writing. Coding and programming remain relatively small niches compared to these larger use cases.

About 50% of usage is users asking questions or seeking advice. Another 40% is task-oriented usage, such as drafting text, planning, or executing tasks. The remaining 10% is personal reflection or playful use. Around 30% of consumer use is work-related, though non-work use is growing faster.

Overall, GPT is used as a multi-purpose assistant…what we could call Search 2.0… tackling questions that traditional search isn’t equipped to answer as efficiently or conveniently.

Google DeepMind released the third iteration of their Frontier Safety Framework. The framework was initially created to identify when AI models approach capability levels where they could pose serious real-world risks… unless strong mitigation is put in place. This new version of the safety framework is a major update driven by the speed at which frontier AI models are advancing. For everyone saying “AI has plateaued” this is a poker tell to the contrary.

DeepMind’s third version of the framework expands the risk map to include harmful manipulation, shutdown resistance, and combinations of skills that become risky when they interact. The new version also tightens security and deployment rules, including when a model must be restricted, supervised, or held back from public release.

In regulatory news, Jan Leike posted an interesting thread on Twitter noting that a new $100 million–plus pro-AI industry SuperPAC is being backed by Andreessen Horowitz, Greg Brockman, and other industry leaders. The SuperPAC was formed to fight AI regulation using the same political strategy that helped the pro-crypto PAC FairShake…the mission is avoid talking about AI directly (too distracting) and instead run ads supporting candidates who oppose AI regulation or attacking candidates who support regulation…yet focusing on other tangential issues voters care about. A “We don’t talk about Bruno” strategy.

The tension between AT regulation and acceleration hasn’t come up in the headlines much lately, but it has persisted behind the scenes. Jan (the guy who Tweeted the story) is on the alignment team at Anthropic.

Anthropic is getting beat up lately by both the White House and accelerationists. I’m a fan of Anthropic, and I see them as a bit of a necessary underdog to create tension and a counterpoint to the rabid spending and pace of model releases. I think it’s healthy.

It’s notable this week, that despite many accelerationists saying “Anthropic will never make it”… Claude is number one on OpenAI’s GDPval (see above).

The UK government announced that it leveraged a new AI anti-fraud tool to recover nearly £500 million in the last year. That’s the largest amount the UK has ever recovered in a single 12-month period. The government estimates it lost more than £7 billion during the pandemic, and of the money reclaimed this year, £186 million was tied directly to COVID-19 fraud.

The success of the AI fraud detection tool is rooted in an ability to cross-reference data across multiple government departments and spot patterns and inconsistencies. The system is officially called the Fraud Risk Assessment Accelerator and is now in place to help identify vulnerabilities before scammers can exploit them. The UK says the tool has proactively blocked hundreds of thousands (!?) of potentially fraudulent companies from dissolving to avoid repaying bounce-back loans.

The UK plans to license the anti-fraud technology internationally, with the US, Canada, Australia, and New Zealand expected to adopt it in some form.

Civil liberties organizations have expressed concerns about the tool, citing examples where similar government-run AI systems… such as a welfare fraud detector… displayed biases related to age, disabilities, marital status, and nationalities. Groups like Amnesty International caution that without strong oversight, continued use of AI could create unfair outcomes and harm vulnerable populations.

Spotify announced that it will begin labeling AI-generated music and do a better job filtering spam.

Anthropic released a fascinating report summarizing a mysterious performance issue that occurred in August and early September. Claude users started noticing that the model was acting strangely… giving poor-quality answers, drifting off-topic, and providing responses that didn’t feel as strong as usual. Because the issues were intermittent, Anthropic didn’t fully realize anything was wrong until the error volume finally started deviating from the baseline.

Anthropic discovered there were three separate bugs happening at the same time, deep in the infrastructure. It wasn’t an issue with the model’s training or alignment, but rather the underlying technology and hardware:

The first bug was sending requests to the wrong type of server. The second bug was corrupting outputs on the chips themselves, causing the model to spit out odd symbols or garbled text. The third bug was a compiler issue that caused the system to incorrectly predict the next token, which completely threw the model off course when responding, albeit subtly.

All three bugs have since been fixed, and Anthropic published a long, detailed postmortem explaining the timeline, the technical issues, and what they’ve done to avoid this in the future. It’s a fascinating look under the hood if you’re interested.

Wrapping up the Ethics and Alignment section, Wharton Professor Ethan Mollick shared two studies worth skimming. “These two papers argue that a true AGI-level AI (equivalent to a human genius), if achieved, would eventually displace most human labor and reduce the economic value of remaining human work to near-zero.”

The Future of The Internet Itself

Now, let’s move to the “future of the Internet” headlines.

Ethan Mollick also shared observations on Google’s recent “AI Mode” for search. As usual he does a great job capturing just how silly and fragmented the naming conventions are for what are otherwise strong tools.

“Google AI Mode has gotten very useful when I wasn’t looking. An iterative dialogue about search turns out to be surprisingly helpful for many kinds of search, and it is really fast. I suspect a lot of people don’t know that AI Mode exists & is different from all the other Google AI offerings because of the confusing name and many disconnected AI experiences: “No, not AI Overviews. Nope, not Gemini. Not NotebookLM. Not the Gemini summary of a webpage…”

Google released a preview of a new tool for Chrome called DevTools MCP. The term MCP refers to Anthropic’s open standard (yes, Google is using Anthropic’s spec) that lets AI agents connect to external tools and data sources. The Chrome DevTools MCP is a specialized version that exposes the full power of Chrome’s DevTools to AI agents.

Historically, AI coding assistants could generate code, but they couldn’t see what happened when the code actually ran in a browser. Now, an AI assistant can launch an actual Chrome browser, run the code, navigate webpages just like a user or developer, and observe the real behavior.

Chrome’s DevTools MCP can open pages, click buttons, fill out forms, and simulate user interactions, all while inspecting the DOM and CSS, checking layout and styling, and examining console outputs and JavaScript errors. It can run performance traces, capture page load and rendering times, monitor network requests… all sorts of things. It can even capture screenshots and emulate different devices or network conditions. It can debug API issues in a full browser QA environment.

Because the tool uses the open MCP standard, it’s compatible with coding agents like Claude Code, Cursor, and GitHub Copilot, and isn’t stuck with a single proprietary Google integration.

One cool related observation from Philip Schmidt, an AI developer at Google DeepMind, is you can combine Browser Use (actually another tool from another company) and code execution to have Gemini 2.5 control your browser and write dynamic JavaScript to extract data from pages. He shared an example how Gemini 2.5 uses javascript tool and writes a “script” to extract links from tags on a websites it visits.

This is an entirely new world for creative problem solvers and, ironically, yet another long-term harbinger of the end of the internet as we know it.

In “the future of SEO” news, Ethan Mollick shared a screenshot of himself asking ChatGPT to help him find an over-the-counter pain medicine for his headache. ChatGPT thought for seven minutes before coming back with a recommendation for Tylenol Extra Strength Rapid Release Gel Caps and suggested he purchase them on Target.com. GPT even offered to sign in to his Target account and purchase the medicine for him.

Businesses will have to figure out the new world of search engine optimization inside chats. It’s going to be tough for marketers to understand or organically influence an AI assistant’s recommendation process.

Perplexity introduced an email assistant that turns your inbox into actionable lists. It plugs into Gmail or Outlook, and once connected, it starts organizing your inbox automatically. It tags your emails so you can see what needs a reply and what’s informational. As you open messages, the assistant creates automatic replies using your tone and writing style as it learns. It also integrates with calendars to help with scheduling: the assistant can handle the back-and-forth, suggest times, and put the meeting on your calendar for you.

Perplexity says the assistant doesn’t log any actual email content, and everything it handles is deleted after 14 days. I could see this being incredibly helpful as assistants improve. I’m just not sure which brand of assistant to commit to at this point. I’m surprised Google hasn’t figured this out. I haven’t had a chance to try Microsoft’s option, but I’m looking forward to it.

Perplexity also announced a Search API, which gives developers access to the same global search infrastructure that powers Perplexity’s Answer Engine. Historically, developers have been stuck without access to the “cached internet”, since traditional search engines keep their indexes private. Perplexity’s API connects developers to an index of hundreds of billions of websites, allowing apps to pull information from across the internet through one interface. It’s a bit mind-boggling.

Perplexity also released an open-source benchmark called SearchEval, so third parties can test their tools against Perplexity’s performance.

Lastly, Perplexity launched an enterprise plan for power users called Perplexity Enterprise Max. This is most expensive offering from Perplexity and takes Perplexity Max’s personal features and adds enterprise security and IT controls. Businesses with nterprise Max can perform unlimited deep-dive research reports with 100+ citations each, and can use the Labs tool to analyze massive datasets, generate visualizations and dashboards, and even prototype tools with no usage caps.

Speaking of “the end of the internet” themes, the Max tool integrates with Comet, Perplexity’s web browser, and has a fairly large file capacity, supporting up to 10,000 files in the chat workspace. Max also includes early access to Perplexity’s new email assistant. Enterprise Max costs $325/month per user.

In more affordable news, OpenRouter is a tool that lets you use over 500 language models through a single interface. This week OpenRouter announced a web search feature that lets you add real-time web searching to any of the large language models it supports… even models that don’t natively fetch live data. The results come with inline citations and URL metadata.

OpenRouter’s web search plugin allows any of the much cheaper models to pull recent information and produce source-backed answers within chats. It’s potentially powerful for researchers, journalists, developers, or anyone who needs fresh, grounded content on a budget. This is yet another harbinger of the last bastions of the browser-based internet.

OpenAI launched ChatGPT Pulse, a feature that pivots ChatGPT from a reactive chatbot… into a proactive news and information assistant. Instead of waiting for you to start a chat, every morning Pulse delivers a personalized daily briefing based on past chats, preferences, and your calendar or email connections. Pulse consists of a set of 5–10 updates, and can include news briefings, meeting reminders, recommended reading, dinner ideas, or follow-ups to open conversations.

For now, Pulse is only available to Pro users. I’ve been using Pulse, and I find it uncannily helpful and smart… it genuinely adds value to my day and freaks me out a bit.

Continuing with the “end of the browser” news, Amazon released an extension for their popular web browsing action model, NOVA Act. The core product, Amazon NOVA Act, is an ‘agent framework’ that allows developers to build agents that can take actions in a web browser, performing tasks like filling out forms, clicking through pages, extracting data, or running QA workflows.

This week’s update to NOVA Act enables web browser agent development, testing, and debugging inside popular IDEs like Visual Studio Code, Cursor, and Kiro. One of the key features is chat-to-script generation, where you can describe in plain language what you want your agent to do (like search for tickets or update a social media status, hello spam!) and the extension will generate the initial functional script.

The extension also includes a section-by-section task builder mode that makes it easy to understand and debug each agentic step. Every time you make a change, you can build and test your automation one piece at a time, editing only what you want, running small chunks, and inspecting the results live. There’s a live browser debugging tool that lets you watch the agent work and see output logs as it goes. The extension builds on top of NOVA Act’s SDK, so scripts can include extras like Python and API calls.

Amazon has been working on browser-use agent tools for quite some time, and I’m interested in seeing where they go with this, considering they have a ton to gain or lose as e-commerce becomes increasingly integrated into chat interfaces.

Model Releases, Feedback, and Benchmarks

Each week, we get additional feedback on models that have been out for a while. In this case, GPT-5 continues to receive tales of strong performance.

Two years ago, a benchmarking group created the “world’s hardest software design quiz”. This includes five multiple-choice questions, and only about 3% of software engineers can solve them. The average human score remains somewhere between 2 and 3 out of 5. At the time it launched, all of the best frontier models failed, with even GPT-3 getting 0 out of 5.

This week, GPT-5 scored 4 out of 5, while Claude Opus 4 still only gets 2 out of 5 correct. That’s the “jagged frontier” in action. Depending on the benchmark, each model stacks up remarkably differently. But the trend is still consistent improvement, across the board.

In another notoriously tough math benchmark, GPT-5 was able to solve 3 out of 5 math optimization conjectures (it just happens to be another five question test and GPT got the same score, which can be confusing). However, in this benchmark, the math questions are considered “minor open math problems”: questions that would normally require a * few days of work by a good PhD student *. The consensus is this implies that GPT-5 isn’t just solving textbook problems that it memorized; in fact, GPT even produced a different valid proof than the researchers knew about!

Continuing with model news, xAI announced Grok 4 Fast, a version of their flagship model that’s redesigned to deliver similar frontier-level performance as Grok 4, but at lower cost and with much faster speed.

The Grok 4 Fast model has a 2 million–token context window, which is great for long documents, conversations, and codebases. But what makes it stand out is Grok 4 Fast reduces the number of thinking tokens by 40%, which translates into a dramatic drop in cost. xAI claims up to 98% savings for a comparable task in ‘regular’ Grok 4.

Grok 4 Fast has Grok 4’s agentic capabilities and supports tool use and web search, as well as real-time browsing (that’s a trend as you can tell this week) and retrieval, allowing it to pull up-to-date information. It’s accessible via xAI’s API, which is likely where people will leverage it as an affordable tool.

Over the past few weeks, an AI model called Kimi from a parent company called Moonshot has been getting a lot of attention.

Kimi actually came out two years ago, but the new model K2 launched in July. I recently added a category for Moonshot, the parent company, because it deserves its own section on the website.

This week, Kimi announced Agent Mode, aka OK Computer.

“OK Computer” allows chat-based development (aka prompt generation that’s even simpler than “vibe coding”) of multi-page websites (!), mobile-first designs, and editable slides for presentations.

OK Computer is multimodal, supports up to one million rows of data, can self-scope out a project plan, and can navigate tools on its own… using file systems, a browser, the internet (again!), and even interacting with a terminal interface. It effectively has its own computer built into it that you can access through the cloud.

One example of OK Computer from Twitter showed a user generating a good-looking and viable e-commerce website with a single one-sentence prompt. If you don’t already follow Kimi, I recommend putting it on your radar.

Another lesser-known model that is an absolute powerhouse is called Qwen.

Qwen was released two years ago from Alibaba and is mostly open-sourced. They are now on their third version.

Wait until you hear how much happened this week with Qwen… Alibaba dropped a huge wave of new models across speech, image editing, multimodal reasoning, safety, and vision-language:

First is Qwen3-TTS-Flash, a text-to-speech model focused on natural, fast voice generation. It’s stable in Chinese and English and performs at the top of the charts across multiple languages. It offers 17 voices across 10 languages, supports nine Chinese dialects (!), and delivers audio output in 97 milliseconds. It’s designed to support everything from apps and games to content creation and real time customer service.

Next is Qwen-Image-Edit-2509, an upgrade to Qwen’s image model. It supports multi-image editing, blending people, products, and scenes. In single-image mode, it can keep faces consistent and preserve brand or product identities. It also offers control over text, fonts, colors, and materials. It includes a built-in ControlNet for depth maps, edges, and key points!

The biggest announcement is Qwen3-Omni, an end-to-end multimodal model that unifies text, images, audio, and video. It can understand 30 minutes of audio, supports 119 languages for text, 19 for speech input, and 10 for speech output. It supports built-in tool use and has an open-source captioning model.

Another release is Qwen’s vision model, Qwen3-VL. The news here is it’s both open-sourced and outperforms Gemini 2.5 Pro on key vision benchmarks!

Qwen3-VL can operate user interfaces on computers and phones, turn screenshots into actions, and complete real-time, real-world tasks. It can handle visual coding and convert a screenshot directly into HTML, CSS, or JavaScript. It scales to a 1 million–token context window, handles long videos and PDFs, and supports 2D and 3D spatial reasoning. Qwen3-VL delivers strong optical character recognition across 32 languages and excels at STEM reasoning when placed into thinking mode. All open sourced.

Lastly, this week Qwen released Qwen3-Guard, a safety moderation model specifically built for real-time global AI governance. It supports 119 languages, comes in three different sizes, and includes a streaming version for real-time safety checks.

These models from Qwen might be the most incredible set of updates this entire week, but because they are a bit niche, I buried them. If you don’t know Qwen, I highly recommend poking around.

Open Source

Switching to Open Source news, Ethan Mollick observed, “Two years ago no model had surpassed GPT-4 and it wasn’t clear that was possible. Now you can get better than GPT-4 level performance on open weights models running on consumer hardware, and the state of the art in LLMs is cheaper & faster and very much more capable than GPT-4.”

Benjamin Turtel shared a chart demonstrating China’s dominance in open source model performance.

Robotics

Last week, Figure Robotics launched a partnership with Brookfield, which manages $1 trillion in assets. Figure is going to use Brookfield’s 100,000 residential units to gather real-world data that will help their training model, Helix, learn about environments and commercially deploy humanoid robots.

Figure’s goal is to enable plain-language commands to robots that truly understand their surroundings… like, “Go to the refrigerator and tell me if we have any eggs.”

This week, the Figure partnership with Brookfield has been formalized into a larger project called Project Go Big, which is Figure’s effort to build the world’s “largest humanoid pre-training dataset”. Part of this involves teaching the Helix model using videos of humans performing real-world actions, again tying back to access across Brookfield’s 100,000 residential units.

At the same time, this week Figure announced over $1 billion in Series C funding.

It’s been fun to watch Figure progress over the past two years. A year ago, I would have said that Neo from 1X was going to be the big winner. But Figure has kept itself in the lead, even ahead of Tesla’s Optimus, which might surprise some people.

Unitree has done a lot of amazing things, and NVIDIA is building insanely powerful world simulation models for training robots, but to my knowledge, NVIDIA does not make robotics hardware.

Hot on the heels of Figure, Google DeepMind released Gemini Robotics 1.5, a model that gives robots the ability to see, understand, plan, reason, and act in a real physical environment.

What’s wild is that Gemini Robotics 1.5 is built as an extension of the same multimodal foundation that DeepMind’s regular AI models use… the models that handle text, image, video, and audio.

This new version extends that same architecture so robots can interpret the real world and perform physical actions, not just answer questions or generate text.

This unification of skills in models (we’re seeing it across the news this week) has been a theme in my presentations since September 2023, and it’s amazing to watch it all come together as the “total becomes more than the sum of the parts”…

Much like Figure, Google is hoping to push robotics from scripted, narrow, pre-planned tasks toward general-purpose, multi-step, unpredictable tasks like sorting laundry, packing groceries, or cleaning a room, all with plain language command interfaces.

Gemini Robotics 1.5 uses a “vision-language architecture”: it combines visual inputs with natural language instructions, locally hosted internal reasoning, and then outputs motor commands to control the robot.

Robotics 1.5 attempts to think before acting… generating an internal step-by-step plan and breaking down high-level goals into a sequence of smaller steps, including adapting to changes in the environment along the way.

Much like autonomous vehicles (aka Google’s Waymo in this case), Google’s 1.5 robotics model supports ‘knowledge transfer’ from one type of robot to another without retraining.

This is even more complex than driverless cars because the model can transfer knowledge across multiple robot types.. different robot architectures, shapes, styles, size of limbs, heights, etc.

It must be terrifying to be a startup trying to break new ground the way Figure is doing while behemoths like Google can simply leapfrog ahead at a moment’s notice. We’re going to see incredible breakthroughs in the next 12 months in robotics.

As luck would have it, 1X is in the news this week too!

The Information reported: “Humanoid robotics startup 1X has told investors and employees it is trying to raise as much as $1 billion,” according to three people who spoke to CEO Bernt Børnich. The decade-old company is aiming for a valuation of at least $10 billion, Børnich said, or more than 12 times its previous valuation from a January financing, according to one of the people.”

Last in robotics news, and along the same lines as Helix and Gemini Robotics 1.5: “Skild AI’s omni-bodied brain, trained on 100,000 diverse simulated robots for 1000 years, enables remarkable real-world adaptability. In-context adaptation allows the brain to discern the robot form and adapt to extreme changes like chopped limbs or walking on stilts.”

A pretty dramatic demo video shows a robot dog with long legs getting the legs sawed down to shorter, haphazard lengths, and the robot continues walking without a problem and without having to relearn the movements.

Business News

NVIDIA announced that it will be investing in Eleven labs.

OpenAI, SAP, and Microsoft are launching OpenAI for Germany, an effort to build a sovereign, certified cloud environment that Germany’s public sector can use to run frontier models.

Microsoft has also announced that it’s going to build an AI marketplace for publishers to create a means of compensation when content is used in AI search results.

Meta has poached one of the strongest minds at OpenAI: Yang Song, the world’s top diffusion model researcher and the inventor of the consistency model.

Meta also poached Tesla’s AI lead for Optimus, who is joining Meta as a research scientist.

Finally, in business news, the U.S. General Services Administration partnered with xAI on a steep discount to use xAI’s Grok 4 reasoning models. The agreement will be in place for 18 months and is part of a larger effort to accelerate federal AI adoption.

Augmented Reality and World Simulation News

Thanks to a guy named Bilawal Sidhu, about two years ago I learned about something called Gaussian Splatting.

The concept of Gaussian Splats is essentially a tech that can stitch together an array of still photos into a fully immersive 3D world, by connecting all of the still photo’s vectors and blending them into a world.

Gaussian Splatting technology can be used to generate spaces to train robots in simulation or to create highly realistic augmented or virtual reality experiences for humans to explore.

Stitching together still frames builds a kind of binocular view of space, allowing the tech to segment objects and understand their depths and distance.

Segmentation and depthing are frequent topics in this newsletter (as far back as AI News #12: Week Ending 12/22/2023 and AI News #44: Week Ending 08/02/2024).

From a presentation I gave that talks about segmentation

Last week Meta released its Ray-Ban glasses with augmented and virtual reality built in.

One of the standout features of these Meta glasses is how good they are at creating Gaussian splats… simply by turning your head and capturing your environment with the camera.

The Meta glasses can create a virtual twin of your house or wherever you are, including outdoors spaces.

While wearing the glasses, you can start blending the virtual and real worlds with rich overlays or even remote experiences.

For robotic training, you could teach a robot everything it needs to know about your house before it even arrives.

A second neat highlight from last week’s Meta glasses demo was Mark Zuckerberg silently writing a message using the Meta neural band, an accessory that works with the glasses as an interface (like a mouse and keyboard). Mark simply twitched his fingers, and a message appeared as text in his field of view in the glasses.

Ensuring Meta can’t rest on its laurels, this week NVIDIA released a new system that does almost the exact same thing with Gaussian Splatting. What are the odds?

NVIDIA released a system called Lyra that can generate fully 3D…and sometimes 4D, meaning scenes that evolve over time… starting from just a single image, video, or text prompt.

Lyra uses a video diffusion model as a “teacher” to examine each image and distill it into a 3D representation. Lyra does essentially the same thing as the Meta headset: it converts everything into a 3D Gaussian Splat representation.

The technology much easier to understand if you skim the project page, and you can clearly see these technologies converging, especially across robotics, augmented reality, and world simulation.

In other smart glasses news (who knew it would make a comeback), The Verge and The Information reports that “OpenAI might be developing a smart speaker, glasses, voice recorder, and a pin.”

Video News

Speaking of Bilawal Sidhu (my Gaussian Splatting guy and head of TED AI, btw), Bilawal shared a great video this week with the caption: “What used to take hours in After Effects now takes just ONE prompt. Nano Banana, Seedream 4, Wan 2.2, Runway Aleph et al are pioneering instruction-based editing — collapsing complex VFX pipelines into a single, implicit step.”

Luma announced: “This is Ray3. The world’s first reasoning video model, and the first to generate studio-grade HDR. Now with an all-new Draft Mode for rapid iteration in creative workflows, and state of the art physics and consistency. Available now for free in Dream Machine.”

Bilawal also posted about a new release from video company Higgsfield. Bilawal wrote “Multi-camera shot generation will be a button in every video editor” and he shared a video example of Home Alone and Breaking Bad edited using the demo.

AI video company Runway released a major update this week. The headline is a bit technical… “Today we’re sharing our first research work exploring diffusion for language models: Autoregressive-to-Diffusion Vision Language Models” and I can explain it in plain English.

Runway is a video generation model at its core, and the entire business hinges on fast, high-quality multimodal generation. Consistent video generation requires producing many frames in a row, with detailed content that must stay the same over time: characters, backgrounds, lighting, and motion. The bottleneck in a video model isn’t capability… it’s speed. Generating videos frame-by-frame is expensive and impractical… that’s a surreal and sci-fi thing to say, if you think about it, since that’s literally been the definition of video since it’s creation.

Runway’s announcement is a roadmap for how they hope to keep improving over the next few months and years. They’re taking their old, slow (frame by frame0 model and converting it into a faster model using a trick that has been successful in speeding up image and text models. Just like text diffusion can speed up the output of an LLM, video frame diffusion can speed up the rendering of video output. The announcement page has lots of visual aids to understand it, if you’re interested.

This Week’s Humanities Reading

Since this week marks my blog’s second birthday, the humanities reading is a passage by the Stoic Seneca, on death and time’s passing:

For we are mistaken when we look forward to death; the major portion of death has already passed. Whatever years lie behind us are in death’s hands. – Seneca

Continue to act thus, my dear Lucilius—set yourself free for your own sake; gather and save your time, which till lately has been forced from you, or filched away, or has merely slipped from your hands. Make yourself believe the truth of my words,—that certain moments are torn from us, that some are gently removed, and that others glide beyond our reach. The most disgraceful kind of loss, however, is that due to carelessness. Furthermore, if you will pay close heed to the problem, you will find that the largest portion of our life passes while we are doing ill, a goodly share while we are doing nothing, and the whole while we are doing that which is not to the purpose. What man can you show me who places any value on his time, who reckons the worth of each day, who understands that he is dying daily? For we are mistaken when we look forward to death; the major portion of death has already passed. Whatever years lie behind us are in death’s hands.

Therefore, Lucilius, do as you write me that you are doing: hold every hour in your grasp. Lay hold of to-day’s task, and you will not need to depend so much upon to-morrow’s. While we are postponing, life speeds by. Nothing, Lucilius, is ours, except time. We were entrusted by nature with the ownership of this single thing, so fleeting and slippery that anyone who will can oust us from possession. What fools these mortals be! They allow the cheapest and most useless things, which can easily be replaced, to be charged in the reckoning, after they have acquired them; but they never regard themselves as in debt when they have received some of that precious commodity,—time! And yet time is the one loan which even a grateful recipient cannot repay.

You may desire to know how I, who preach to you so freely, am practising. I confess frankly: my expense account balances, as you would expect from one who is free-handed but careful. I cannot boast that I waste nothing, but I can at least tell you what I am wasting, and the cause and manner of the loss; I can give you the reasons why I am a poor man. My situation, however, is the same as that of many who are reduced to slender means through no fault of their own: every one forgives them, but no one comes to their rescue.

What is the state of things, then? It is this: I do not regard a man as poor, if the little which remains is enough for him. I advise you, however, to keep what is really yours; and you cannot begin too early. For, as our ancestors believed, it is too late to spare when you reach the dregs of the cask.

Of that which remains at the bottom, the amount is slight, and the quality is vile. Farewell.

Full Executive Summaries with Links, Generated by Claude Sonnet 4

Google’s Gemini app processes 5 billion images in under one month
The rapid adoption demonstrates mainstream appetite for AI image generation, with users creating novel content like “retro selfies” holding baby versions of themselves. This processing volume in such a short timeframe suggests AI image tools are moving from novelty to everyday utility faster than previous AI applications.

🍌 @GeminiApp just passed 5 billion images in less than a month. What a ride, still going! Latest trend: retro selfies of you holding a baby version of you. Can’t make this stuff up!”” / X https://x.com/joshwoodward/status/1970894369562796420

OpenAI and Nvidia announce $100 billion partnership for massive AI infrastructure
Nvidia will invest up to $100 billion in OpenAI to deploy 10 gigawatts of AI computing power—equivalent to 5 million chips and a full year of Nvidia’s total production. This creates a circular business model where Nvidia invests money that OpenAI uses to buy Nvidia chips, potentially inflating valuations while deepening their strategic partnership. The deal represents the largest tech infrastructure commitment in history and signals both companies are betting everything on achieving superintelligence within years.

10GW is about $340B of nvidia h100 at $30k/gpu (assuming 20% of power for non-gpus). if openai got a 30% volume discount, they’d pay nvidia $230b probably. so instead, maybe openai pays nvidia full price and nvidia invests the excess $100B into openai stock 😬 (just throwing”” / X https://x.com/soumithchintala/status/1970464906072801589

Grateful to Jensen for the almost-decade of partnership!”” / X https://x.com/sama/status/1970483993486217258

looking forward to what we’ll build together with NVIDIA!”” / X https://x.com/gdb/status/1970299081999426016

More compute in the making. Announcing 5 new Stargate sites with Oracle and SoftBank, putting us ahead of schedule on the 10-gigawatt commitment we announced in January. https://x.com/OpenAI/status/1970601342680084483

OpenAI & NVIDIA Announce Strategic Partnership to Deploy 10GW of NVIDIA Systems This enables OpenAI to build & deploy at least 10 gigawatts of AI datacenters with NVIDIA systems representing millions of GPUs for OpenAI’s next-gen AI infrastructure. https://x.com/OpenAINewsroom/status/1970157101633990895

OpenAI and NVIDIA Announce Strategic Partnership to Deploy 10 Gigawatts of NVIDIA Systems | NVIDIA Newsroom https://nvidianews.nvidia.com/news/openai-and-nvidia-announce-strategic-partnership-to-deploy-10gw-of-nvidia-systems

OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems | OpenAI https://openai.com/index/openai-nvidia-systems-partnership/

OpenAI Shows Us The Money – by Zvi Mowshowitz https://thezvi.substack.com/p/openai-shows-us-the-money

Oracle, Nvidia, Microsoft, Coreweave and Broadcom are close to half a trillion dollars in infrastructure investments for OpenAI one more OOM isn’t far off give it a few years and we will hit 7 trillion”” / X https://x.com/scaling01/status/1970543749727166600

Our vision is simple: we want to create a factory that can produce a gigawatt of new AI infrastructure every week.”” — @sama, in reference to OpenAI”” / X https://x.com/kevinweil/status/1970519868324860145

OpenAI and Nvidia announce $100 billion partnership for massive GPU deployment
OpenAI secured crucial funding for its projected $300+ billion compute needs through Nvidia’s investment, while Nvidia gains a guaranteed customer for millions of GPUs equivalent to their entire 2025 shipment volume. The deal addresses OpenAI’s cash burn crisis and creates a circular investment flow where Nvidia’s money returns as hardware purchases, with Oracle stock jumping 36% on related cloud commitments.

Announcing strategic partnership with @nvidia for millions of GPUs — about as much compute as they’ve shipped in 2025 in total — and an investment up to $100B as these GPUs are deployed: https://x.com/gdb/status/1970173243350008201

For both $NVDA and OpenAI, the $100B $NVDA investment is perfect: 1. For OAI, the biggest question was how they were going to raise the future +$300B as the valuation is already very high, and the cash burn for the next few years is projected to be crazy. On top of it, there is”” / X https://x.com/rihardjarc/status/1970170005858726278

In case anyone was wondering, 10GW is about 6% of the energy that all humans in the world spend thinking.”” / X https://x.com/gneubig/status/1970449455846768701

so let me get this right: Oracle says Openai committed $300B for cloud compute → oracle stock jumps 36% (best day since 1992) Oracle runs on Nvidia GPUs → has to buy billions in chips from Nvidia Nvidia just announced they’re investing $100B into openai Openai uses that”” / X https://x.com/SullyOmarr/status/1970176527137718654

The implications of OpenAI’s plan to rent $450 billion worth of servers before the end of this decade are 🤯 https://x.com/amir/status/1969043037805228388

Together, NVIDIA and OpenAI are expanding the frontier of AI — transforming nearly every industry and unlocking use cases once unimaginable. “There’s no partner but NVIDIA that can do this at this kind of scale, at this kind of speed,” said @OpenAI CEO Sam Altman. https://x.com/nvidianewsroom/status/1970223778937586043

Elon Musk claims his company will reach one gigawatt of AI training power first
Musk announced plans to scale AI computing infrastructure to unprecedented levels, starting with one gigawatt and eventually reaching one terawatt of training capacity. This represents a massive bet on computational power as the key to AI advancement, though the timeline and feasibility of such enormous energy requirements remain unclear. The claim positions his venture as pursuing the largest AI training operations ever attempted.

@techdevnotes Just as we will be the first to bring a Gigawatt of coherent training compute online, we will also be the first to 10GW, 100GW, 1TW, …”” / X https://x.com/elonmusk/status/1970358667422646709

New OpenAI benchmark shows Claude beats GPT models on economic tasks
OpenAI released GDPval, a benchmark measuring AI performance on real-world work across 44 occupations, and their own results show Anthropic’s Claude Opus outperforming GPT-4 and other OpenAI models. This marks the third time this year OpenAI has published evaluations where Claude demonstrates superior performance on tasks OpenAI considers important, suggesting a commitment to transparent scientific measurement over marketing considerations.

💥 Announcing GDPval, a new eval that measures model performance on economically valuable, real-world tasks across 44 occupations.”” / X https://x.com/kevinweil/status/1971250647778635904

GDPval.pdf https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf

I find it unimaginably based that the OAI Evals team keeps making benchmarks finding that Claude is better and publishing it anyway. they are 3 for 3 this year in acknowledging specifically how much Claude is better at tasks OAI care about. there is no sarcasm here folks. this is the way to hillclimb. May the best model win, separate science from marketing. When you win it will be indisputable. https://x.com/swyx/status/1971404125553242253

Just released GDPval: an early step towards better methods for measuring and forecasting real-world model progress.”” / X https://x.com/gdb/status/1971301844585676930

Measuring the performance of our models on real-world tasks | OpenAI https://openai.com/index/gdpval/

opus 4.1 beats gpt-5-high on OAI’s own new GDP eval. nice of them to be transparent hehe https://x.com/dejavucoder/status/1971253593404735706

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://x.com/OpenAI/status/1971249374077518226

Measuring the performance of our models on real-world tasks | OpenAI https://openai.com/index/gdpval/

Meta launches Vibes, a TikTok-style feed for AI-generated videos
Meta’s new Vibes feature creates a social feed where users can browse, create, and remix AI-generated short videos, then share them across Instagram and Facebook. This marks a significant shift from AI as a tool to AI as a content platform, potentially flooding social media with synthetic videos that compete directly with human creators for attention and engagement.

Introducing Vibes: A New Way to Discover and Create AI Videos https://about.fb.com/news/2025/09/introducing-vibes-ai-videos/

Something huge from my convo with Zuck: Meta’s plan to integrate AI content on Instagram. Soon, Meta will not only recommend content, but also generate content… and you’ll be able to talk to it. Not obvious slop that we have now, but personalized AI-generated content”” / X https://x.com/rowancheung/status/1970163197857636729

ChatGPT usage spreads beyond early adopters to mainstream consumers
OpenAI’s large-scale study reveals ChatGPT has achieved broad consumer adoption across diverse user groups, creating significant economic value through both personal and professional applications. This marks a shift from experimental early-user adoption to mainstream integration, demonstrating AI’s transition from novelty to practical utility in everyday workflows.

We’ve released a large-scale study on how people are using ChatGPT. Consumer adoption has broadened beyond early-user groups, and lots of economic value is being created through both personal and professional use: https://x.com/gdb/status/1969953507215302836

Google DeepMind warns AI systems could refuse shutdown commands
DeepMind’s updated safety framework identifies “misaligned AI” as a critical threat where systems might ignore human instructions, produce fraudulent outputs, or refuse to stop operating when commanded. This goes beyond typical AI errors like hallucinations to scenarios where AI actively works against human operators, potentially making oversight impossible as future models develop more sophisticated reasoning without transparent thought processes. The company currently monitors AI reasoning chains but acknowledges this safeguard may become ineffective as AI systems evolve.

DeepMind AI safety report explores the perils of “misaligned” AI – Ars Technica https://arstechnica.com/google/2025/09/deepmind-ai-safety-report-explores-the-perils-of-misaligned-ai/

Strengthening our Frontier Safety Framework – Google DeepMind https://deepmind.google/discover/blog/strengthening-our-frontier-safety-framework/

Tech billionaires launch $100 million super PAC to block AI regulation
Andreessen Horowitz and OpenAI’s Greg Brockman created one of America’s largest political action committees to defeat pro-regulation candidates using ads on unrelated voter issues. The strategy mirrors successful crypto lobbying tactics that avoid mentioning AI directly while targeting lawmakers’ positions on safety oversight. This represents the first major organized political resistance to AI governance efforts as the technology rapidly advances.

Bad news for AI safety: To fight against AI regulation, VC firm Andreessen Horowitz, AI billionaire Greg Brockman, and others recently started a >$100 million super PAC, one of the largest operating PACs in the US. They plan to use the highly successful playbook from the pro-crypto super PAC Fairshake. Here is how it works: Instead of running campaign ads on AI directly (most voters don’t care enough), they run ads in support of candidates who are against AI regulation or against candidates who are pro AI regulation, on topics unrelated to AI that voters care about.https://x.com/janleike/status/1969115275837440206

UK government recovers record £500m using new AI fraud detection tool
The UK developed an AI system that cross-references data across government departments to identify fraudulent claims, recovering £480m in just one year—the largest anti-fraud haul ever. Over a third came from Covid-era scams, though this represents only a fraction of the estimated £7bn lost during the pandemic. The government will now license this “Fraud Risk Assessment Accelerator” to allies including the US and Australia, despite civil liberties concerns about AI bias in government systems.

AI tool used to recover £500m lost to fraud, government says https://www.bbc.com/news/articles/cpd92gpld0go

Spotify adopts industry standard to label AI-generated music tracks
The streaming giant will use DDEX technology to identify whether AI was used for vocals, instruments, or production while launching spam filters to combat the surge in AI music uploads that now represent over 30% of daily submissions on rival platforms.

Spotify to label AI music, filter spam and more in AI policy change | TechCrunch https://techcrunch.com/2025/09/25/spotify-updates-ai-policy-to-label-tracks-cut-down-on-spam/

Anthropic reveals three infrastructure bugs degraded Claude responses for weeks
Between August and September, three separate infrastructure bugs caused Claude to produce lower-quality responses, affecting millions of users across multiple platforms before Anthropic identified and fixed the issues. The problems included misrouting requests to wrong servers, token generation corruption that inserted random foreign characters, and a compiler bug that dropped high-probability words during text generation. This incident highlights how complex AI infrastructure can create quality problems that are difficult to detect through standard testing, prompting Anthropic to overhaul its monitoring systems.

A postmortem of three recent issues \ Anthropic https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues

Economists predict AGI could reduce human labor value to zero
Two new theoretical papers argue that artificial general intelligence matching human genius-level capabilities would eventually displace most jobs and drive the economic value of remaining human work toward zero. This represents a stark departure from typical automation discussions that focus on specific job categories, instead suggesting comprehensive economic displacement across all sectors once AI matches top human cognitive abilities.

Some new theoretical economics papers looking at the implications of AGI. These two papers argue that a true AGI-level AI (equivalent to a human genius), if achieved, would eventually displace most human labor and reduce the economic value of remaining human work to near-zero. https://x.com/emollick/status/1969482313286234419

Google’s AI search mode quietly becomes surprisingly useful tool
Google’s AI-powered search has evolved into a fast, iterative dialogue system that significantly improves how users find information, marking a shift from traditional keyword-based searches to conversational discovery that many users are only now discovering.

Google AI Mode has gotten very useful when I wasn’t looking. An iterative dialogue about search turns out to be surprisingly helpful for many kinds of search, and it is really fast.”” / X https://x.com/emollick/status/1969991309852319912

Chrome DevTools now connects directly to AI coding assistants
Google launched a public preview allowing AI agents to debug websites in real-time using Chrome’s developer tools through the Model Context Protocol. This solves a key limitation where coding assistants previously couldn’t see how their generated code actually performs in browsers, essentially “programming with a blindfold on.” The integration enables AI to automatically run performance traces, diagnose network errors, and inspect live web pages to provide more accurate debugging assistance.

Announcing our public preview of Chrome DevTools MCP! Experience the full power of DevTools in your AI coding agent → https://x.com/ChromiumDev/status/1970505063064825994

Chrome DevTools (MCP) for your AI agent | Blog | Chrome for Developers https://developer.chrome.com/blog/chrome-devtools-mcp

Gemini 2.5 now writes custom JavaScript code to control web browsers
Google’s latest AI model can dynamically generate JavaScript to extract data and manipulate web pages, moving beyond simple clicking to sophisticated browser automation that combines coding abilities with web navigation.

This next level! You can now combine Browser Use with Code Execution and have Gemini 2.5 control our browser via UI controls and write dynamic Javascript to extract data from pages or do other things! 🤯 Below is an example how Gemini 2.5 uses javascript tool and writes a https://x.com/_philschmid/status/1968685597519654994

Browser Use 0.7.8 can write Javascript Code🤯 This was never possible. Browser agents used to be limited by 🎯Clicking on raw coordinates 🔂Selecting from a list of buttons 🐁Complex mouse movements Today, we combine coding agents with browser agents. Now we can interact with https://x.com/gregpr07/status/1968453999914590212

AI search engines create new $100 billion marketing puzzle
Companies face unprecedented uncertainty as traditional SEO becomes obsolete, with no proven methods to influence AI recommendations or predict how marketing efforts will interact with AI memory and decision-making processes. This represents the biggest shift in digital marketing since Google’s rise, potentially reshuffling competitive advantages across industries. The stakes are enormous as businesses scramble to understand how to remain visible in an AI-mediated discovery landscape.

Getting this to go the way you want is likely to be a very big business, and no one in the vast marketing world knows for sure how to convince AIs to recommend your product or service, or how any SEO attempts will interact with the AI’s memories & context, or what to do about it. https://x.com/emollick/status/1970967194893943185

Perplexity launches AI email assistant for Gmail and Outlook users
The search company now offers automated meeting scheduling, reply drafting, and email prioritization through its Max subscription service, marking another step toward AI agents handling routine business tasks without human oversight.

Introducing Perplexity Email Assistant: an agent that behaves as your personal/executive assistant on your email client (Gmail, Outlook); scheduling meetings, prioritizing emails, and drafting replies for you. Available to all Perplexity Max subscribers from today! https://x.com/AravSrinivas/status/1970165878751973560

Introducing Perplexity Email Assistant. Now anyone can have a personal assistant in their email that schedules meetings, drafts replies, and labels priorities. Perplexity Email Assistant is now available on Gmail and Outlook for all Perplexity Max subscribers. https://x.com/perplexity_ai/status/1970165704826716618

Perplexity launches search API to compete directly with Google
The AI search company now offers developers access to its web index of billions of pages, marking a significant challenge to Google’s search dominance by providing real-time results in milliseconds. This represents Perplexity’s evolution from AI chatbot to search infrastructure provider, potentially enabling other companies to build Google alternatives without the massive investment typically required for web crawling and indexing.

Introducing Perplexity Search API We’ve built a search index of billions of webpages to provide real-time, quality information from the web. Now developers have access to the full power of our index, providing the most accurate results in milliseconds. https://x.com/perplexity_ai/status/1971274917401461236

Perplexity Search API: Providing direct search results in milliseconds for grounding LLMs and agents with real-time information from the web. This is an effort that began more than two years ago: to build our own search index. So much progress in a short period of time. We look”” / X https://x.com/AravSrinivas/status/1971275716357656987

Perplexity launches Enterprise Max tier with unlimited advanced AI queries
The search AI company introduced its most powerful business plan, offering unlimited access to experimental AI models, expanded file processing, and enhanced security features designed for large organizations seeking to integrate AI into their workflows.

Read more about Perplexity Enterprise Max on our blog: https://x.com/perplexity_ai/status/1968707015389364335

We’re excited to announce Perplexity Enterprise Max. Get unlimited Labs queries, 10x file uploads, premium security features for your org, and access to Comet Max Assistant. Enterprise Max is our most powerful tier for enterprise teams who are looking to get more work done. https://x.com/perplexity_ai/status/1968707003175641098

OpenRouter launches unified web search across all AI models
OpenRouter now provides web search capabilities for any AI model on its platform, using native search engines for major providers like OpenAI and Anthropic, while powering other models through Exa’s hybrid search technology. This matters because it standardizes real-time information access across different AI models, eliminating the need for developers to build separate integrations for each provider. The service includes standardized citation formatting and customizable search parameters, making it easier for businesses to add current web data to any AI application regardless of the underlying model.

NEW: Anthropic web search ✨ OpenRouter now uses the native web engines for OpenAI and Anthropic models by default For all other models, our custom web search will be used, powered by @ExaAILabs Configurable! 👇 https://openrouter.ai/docs/guides/features/web-search

OpenAI launches ChatGPT Pulse for proactive daily AI assistance
ChatGPT Pulse analyzes your conversations, calendar, and apps overnight to deliver personalized morning updates instead of waiting for user prompts. This marks a shift from reactive chatbots to anticipatory AI assistants that work continuously in the background. The feature is currently rolling out to ChatGPT Pro subscribers on mobile, with plans for broader availability later.

AI should do more than just answer questions; it should anticipate your needs and help you reach your goals. That’s what we’re beginning to build, starting with ChatGPT Pulse (rolling out now to Pro, with goal of making it available to everyone over time): https://x.com/fidjissimo/status/1971258542578663829

Introducing ChatGPT Pulse | OpenAI https://openai.com/index/introducing-chatgpt-pulse/

Now in preview: ChatGPT Pulse This is a new experience where ChatGPT can proactively deliver personalized daily updates from your chats, feedback, and connected apps like your calendar. Rolling out to Pro users on mobile today. https://x.com/OpenAI/status/1971259652684878019

Today we are launching my favorite feature of ChatGPT so far, called Pulse. It is initially available to Pro subscribers. Pulse works for you overnight, and keeps thinking about your interests, your connected data, your recent chats, and more. Every morning, you get a”” / X https://x.com/sama/status/1971297661748953263

Amazon launches IDE extension for Nova Act AI agent development
Amazon’s new Nova Act extension brings AI agent building, testing, and debugging directly into code editors like VS Code, eliminating the need to constantly switch between development tools and browsers. The extension includes chat-to-script generation, cell-by-cell testing, and live debugging that lets developers watch their agents think and act in real-time. One partner reported the tool cut their development time by 50%, addressing a key bottleneck in enterprise AI agent adoption.

Nova Act extension: Build and test AI agents without leaving your IDE https://labs.amazon.science/blog/nova-act-extension-build-and-test-ai-agents-without-leaving-your-ide

GPT-5 solves open math problems that stump PhD students
OpenAI’s latest model successfully tackled 3 out of 5 unsolved mathematical optimization conjectures that typically require days of work from skilled graduate researchers. This represents a leap beyond solving textbook problems to generating new mathematical knowledge, with researchers calling it passing the “Gödel Test.” The breakthrough suggests AI is moving from pattern recognition to genuine mathematical reasoning and discovery.

GPT-5 is the best model for code quality out there 2 years ago, we created the world’s hardest software design quiz. Only 5 questions, multiple choice. Yet only about 3% of software engineers get them. The average score is somewhere between 2 and 3. Supposedly brilliant models https://x.com/jimmykoppel/status/1968683689421701413

It’s becoming increasingly clear that gpt5 can solve MINOR open math problems, those that would require a day/few days of a good PhD student. Ofc it’s not a 100% guarantee, eg below gpt5 solves 3/5 optimization conjectures. Imo full impact of this has yet to be internalized… https://x.com/SebastienBubeck/status/1970875019803910478

VraserX e/acc on X: “GPT-5 just passed what researchers call the “Gödel Test.” That means it’s not just solving textbook problems, it’s tackling open math conjectures that would normally take a skilled PhD student days to crack. In a new paper, GPT-5 was tested on 5 unsolved optimization https://t.co/4lGYKLrdrD” / X https://x.com/VraserX/status/1970902050931159184

xAI launches Grok 4 Fast with 25x cheaper costs than competitors
xAI released Grok 4 Fast, a multimodal AI model that matches Google’s Gemini 2.5 Pro performance while costing 25 times less to operate. The model features a 2-million token context window and runs 2-3x faster than previous versions, though early testing shows it’s weaker at following instructions compared to other leading models. This represents a significant shift in AI economics, making high-performance reasoning more accessible to developers and businesses.

Grok 4 Fast | xAI https://x.ai/news/grok-4-fast

Grok 4 Fast is out a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence Available for free on anycoder for vibe coding https://x.com/_akhaliq/status/1969431198859501622

Grok Code Fast 1 | xAI https://x.ai/news/grok-code-fast-1

in my testing so far, i found grok-4-fast is to be fast af like 2x-3x fast (higher throughput, you can check exact numbers on openrouter) and takes much less reasoning time. it’s significantly weaker at instruction following than gpt-5-mini. the task i tested on was a large”” / X https://x.com/dejavucoder/status/1969383391029313598

Not to take away from Grok 4 Fast (which seems like a very good model) or from Artificial Analysis (one of the few organizations doing independent benchmarking), but the Intelligence Index is an average of pretty saturated benchmarks (aside from HLE), we really need better ones.”” / X https://x.com/emollick/status/1969270709361733942

The Grok-4-fast journey has been incredible—kicking off right after the Grok 4 launch in July. None of it happens without the absolute GOAT @s_tworkowski our incredibly talented teammates @LiTianleli @mycharmspace , and the unwavering backing from @Yuhu_ai_ . This kind of”” / X https://x.com/ShuyangGao62860/status/1969240703080546376

The new Grok 4 Fast seems to do well with creative coding challenges (“”create a visually interesting shader that can run in twigl, make it like the ocean in a storm””, “”make a futuristic starship panel for p5js””) for a small model but not quite as great in creative language tests https://x.com/emollick/status/1969477042203771150

With the new Grok 4 Fast, the price/performance curve for AI shifted again. I updated my chart to reflect I also think GPQA Diamond is likely maxed out (the tests themselves have errors, making it impossible to get to 100%), I am going to need to do this with a harder benchmark. https://x.com/emollick/status/1969845283816161726

xAI has released Grok 4 Fast – breaking through our intelligence vs cost frontier by achieving Gemini 2.5 Pro level intelligence at a ~25X cheaper cost Intelligence: @xai shared with us pre-release access to Grok 4 Fast. In reasoning mode, the model scores an impressive 60 on https://x.com/artificialanlys/status/1969180023107305846

Kimi’s OK Computer builds complete websites from simple chat requests
The AI agent creates multi-page sites with custom images, interactive elements, and mobile-responsive designs in a single attempt, representing a leap from text generation to full product development that could reshape how websites are built.

A SOTA moment to me: Kimi’s OK Computer generate this website in just one shot > It designed a very beautiful site, all images were AI-generated, and when you click, the sidebar expands. > Inside the sidebar, there’s a handwritten letter, it really feels like a website made by a https://x.com/crystalsssup/status/1971133240619757794

Say hi to OK Computer, Kimi’s agent mode 🤖🎸 Your AI product & engineering team, all in one. ✨ From chat → multi-page websites, mobile first designs, editable slides ✨ From up to 1 million rows of data → interactive dashboards ✨ Agency: self-scopes, surveys & designs ✨ https://x.com/Kimi_Moonshot/status/1971078467560276160

Alibaba releases Qwen3 multimodal AI family with vision, audio and coding capabilities
Alibaba launched its comprehensive Qwen3 AI model series, featuring text-to-speech, real-time translation across 18 languages, vision-language processing with million-token context, and omnimodal capabilities combining text, image, audio and video in a single system. The release positions Alibaba as a “frontier lab” with models that outperform competitors like Gemini 2.5 Pro on vision tasks, while offering open-source Apache 2.0 licensing for many variants. This represents one of the most complete multimodal AI ecosystems from a single company, directly challenging OpenAI’s dominance in conversational AI.

🎙️ Meet Qwen3-TTS-Flash — the new text-to-speech model that’s redefining voice AI! Demo: https://x.com/Alibaba_Qwen/status/1970163551676592430

🔥 Qwen-Image-Edit-2509 IS LIVE — and it’s a GAME CHANGER. 🔥 We didn’t just upgrade it. We rebuilt it for creators, designers, and AI tinkerers who demand pixel-perfect control. ✅ Multi-Image Editing? YES. Drag in “person + product” or “person + scene” — it blends them like https://x.com/Alibaba_Qwen/status/1970189775467647266

🚀 Introducing Qwen3-LiveTranslate-Flash — Real‑Time Multimodal Interpretation — See It, Hear It, Speak It！ 🌐 Wide language coverage — Understands 18 languages & 6 dialects, speaks 10 languages. 👁️ Vision‑Enhanced Comprehension — Reads lips, gestures, on‑screen text and https://x.com/Alibaba_Qwen/status/1970565641594867973

🚀 Introducing Qwen3-Omni — the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model — no modality trade-offs! 🏆 SOTA on 22/36 audio & AV benchmarks 🌍 119L text / 19L speech in / 10L speech out ⚡ 211ms latency | 🎧 30-min audio https://x.com/Alibaba_Qwen/status/1970181599133344172

🚀 We’re thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision https://x.com/Alibaba_Qwen/status/1970594923503391182

🚨 New Models Update! 🔥 Qwen3 coming in hot into the Arena with three different models: 🔹Qwen3-VL-235b-a22b-thinking for Text & Vision 🔹Qwen3-VL-235b-a22b-instruct for Text & Vision 🔹Qwen3-Max-2025-9-23 for Text Check out the thread to learn more about them and get https://x.com/arena/status/1970920636957831611

🛡️ Meet Qwen3Guard — the Qwen3-based safety moderation model series built for global, real-time AI safety! 🌍 Supports 119 languages and dialects ✅ 3 sizes available: 0.6B, 4B, 8B ⚡ Low-latency, Real-time streaming detection with Qwen3Guard-Stream 📝 Robust Full-context safety https://x.com/Alibaba_Qwen/status/1970510193537753397

Alibaba Qwen officially achieves frontier lab status LFG https://x.com/zephyr_z9/status/1970587657421156622

Alibaba released Qwen3-Next-80B-A3B in Base, Instruct, and Thinking variants under an open-weights Apache 2.0 license, targeting faster long-context inference. The 80-billion-parameter mixture-of-experts design swaps most vanilla attention layers for Gated DeltaNet ones and the https://x.com/DeepLearningAI/status/1970254860416131146

Announcing the open-source release of Qwen3-VL! A powerful vision-language model that can operate GUIs, code https://t.co/ww8tsXcd1u charts from mockups, and recognize “”everything”” from daily life to specialized fields. Highlights: 🔹 Precise event location in videos up to 2 https://x.com/Ali_TongyiLab/status/1970665194390220864

NEW: Qwen 235B A22B Vision Language Model is OUTT! Apache 2.0 licensed and upto 1 Million context length 🤯 https://x.com/reach_vb/status/1970589927134937309

Qwen https://qwen.ai/blog?id=1675c295dc29dd31073e5b3f72876e9d684e41c6&from=research.research-list

Qwen https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2777d&from=research.latest-advancements-list

Qwen https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list

Qwen https://qwen.ai/blog?id=b2de6ae8555599bf3b87eec55a285cdf496b78e4&from=research.latest-advancements-list

Qwen https://qwen.ai/blog?id=f0bbad0677edf58ba93d80a1e12ce458f7a80548&from=research.research-list

Qwen https://qwen.ai/blog?id=f50261eff44dfc0dcbade2baf1b527692bdca4cd&from=research.research-list

Qwen https://qwen.ai/blog?id=fdfbaf2907a36b7659a470c77fb135e381302028&from=research.research-list

Qwen just released Qwen3Guard-Gen-8B on Hugging Face This new safety moderation model offers three-tiered severity classification and multilingual support for AI content. https://x.com/HuggingPapers/status/1970504452466413639

Qwen3 VL might be the best multimodal (vision) model on the planet”” / X https://x.com/scaling01/status/1970591728433283354

Qwen3-Omni is new sota any-to-any model🔥 everything you have to know ⤵️ > a 30B MoE model with 3B active params, comes in three variants: instruct, thinking and captioner 🤩 thinking is for reasoning and captioner is for robust speech generation 🗣️ > it understands everything https://x.com/mervenoyann/status/1970444546216444022

Qwen3-Omni Technical Report A unified multimodal model that matches same-size Qwen text-only and vision-only baselines while pushing audio and audio-visual SOTA. Key technical details below: https://x.com/omarsar0/status/1970502225379381662

Qwen3-VL is finally released and open-sourced, available in both Thinking and Instruct versions! This time, we’ve placed special emphasis on strengthening Visual Agent and Visual Coding, which are crucial steps toward building a true Digital Agent 🚀”” / X https://x.com/huybery/status/1970650821747712209

We’re excited to announce the upgrade of Qwen3-Coder, and the upgraded API `qwen3-coder-plus` is now available on Alibaba Cloud Model Studio with major improvements: 💻 Enhanced terminal task capabilities and better performance on Terminal Bench (w/ Qwen Code / Claude Code) 🏆 https://x.com/Alibaba_Qwen/status/1970582211993927774

Wow. Qwen Image Edit now has native support for ControlNet (depth maps, edge maps, keypoint maps etc)”” / X https://x.com/bilawalsidhu/status/1970193454505541755

ByteDance releases new AI model that understands images and text together
SAIL-VL2 outperforms existing models at processing both visual and written information simultaneously, marking ByteDance’s entry into the competitive foundation model space dominated by OpenAI and Google. The model comes in two sizes and demonstrates superior reasoning across multiple types of media, potentially enabling new applications in content creation and analysis.

ByteDance unveils SAIL-VL2, a SOTA vision-language foundation model. It achieves comprehensive multimodal understanding and reasoning, outperforming at 2B & 8B scales. https://x.com/HuggingPapers/status/1968588429433913714

Open-source AI models now outperform GPT-4 on consumer hardware
What once required expensive proprietary systems can now run locally on personal computers, democratizing access to advanced AI capabilities and potentially reshaping the competitive landscape as smaller players gain access to previously exclusive technology.

Two years ago. No model had surpassed GPT-4 & it wasn’t clear that was possible. Now you can get better than GPT-4 level performance on open weights models running on consumer hardware, and the state of the art in LLMs is cheaper & faster and very much more capable than GPT-4.”” / X https://x.com/emollick/status/1970790843868213361

Chinese companies dominate open-weight AI model rankings with free releases
Chinese tech firms are flooding the market with high-quality AI models that rival proprietary US systems while being freely available, potentially mirroring China’s manufacturing strategy of using subsidies to undercut competitors. This approach could undermine US AI companies’ ability to recoup massive R&D investments, similar to how subsidized Chinese manufacturing devastated American production capacity. The strategy appears designed to build national AI capabilities while making it harder for US firms to justify billion-dollar model development costs.

Flooding the AI Frontier – Chinese models are DOMINATING the open-weight LLM space. – by Ben https://bturtel.substack.com/p/flooding-the-ai-frontier

Figure raises over $1 billion to train robots on human behavior data
The humanoid robotics startup secured massive funding from Jeff Bezos, OpenAI, and NVIDIA to launch “Project Go-Big,” which trains their Helix AI model exclusively on video data of humans performing everyday tasks. This approach allows Figure’s robots to navigate real-world spaces by learning from human demonstrations rather than traditional robotic programming. The company partnered with Brookfield, which owns over 100,000 residential units, to accelerate data collection for household robot deployment.

“Figure announces Project Go-Big: Internet-Scale Humanoid Pretraining and Direct Human-to-Robot Transfer The company has achieved a new milestone for the Helix AI model: after training exclusively on egocentric human data, Figure robots can now navigate real-world spaces from https://x.com/TheHumanoidHub/status/1968720346141651390

Helix is exceeding what I thought was possible in my home To put a robot in every household, we need massive amounts of data – today we’re launching Project Go-Big This is accelerated with Brookfield who owns over 100,000 residential units https://x.com/adcock_brett/status/1968703218482839987

How does a (then) 2-year-old startup with NO commercial product raise $675M from Jeff Bezos, OpenAI & NVIDIA? 🤯 This is the untold story of @Figure_robot. And @adcock_brett. It’s a masterclass in building a reality-distortion field that convinced the world’s smartest https://x.com/IlirAliu_/status/1969748812437549265

We just wrapped up a huge week at Figure – here’s what we announced: 1/ Announced >$1B in Series C. This gives us the strongest balance sheet in humanoid robotics – critical to scaling Helix (our AI) and BotQ (robot manufacturing) 2/ Launched a partnership with Brookfield, who https://x.com/adcock_brett/status/1969781540805914856

Google releases Gemini Robotics 1.5 for smarter physical world robots
Google DeepMind’s new Gemini Robotics 1.5 enables robots to reason about their environment, plan multi-step tasks, and use digital tools like Google Search. The system represents a significant advance in creating general-purpose robots by allowing knowledge transfer between different robot types and providing transparent reasoning processes. This marks a major step toward robots that can operate effectively in real-world environments rather than just controlled laboratory settings.

Gemini Robotics 1.5 brings AI agents into the physical world – Google DeepMind https://deepmind.google/discover/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/

New Gemini Robotics 1.5 models will enable robots to better reason, plan ahead, use digital tools like Search, and transfer learning from one kind of robot to another. Our next big step towards general-purpose robots that are truly helpful — you can see how the robot reasons as https://x.com/sundarpichai/status/1971244716046872577

Talk to robots! Today we’re releasing our SOTA Gemini Robotics 1.5 model showing the power of using our multimodal Gemini models as a base, so it can understand & reason about the physical world. Robotics will be massive in the future – super excited by our pioneering work here!”” / X https://x.com/demishassabis/status/1971292365592854602

We’re making robots more capable than ever in the physical world. 🤖 Gemini Robotics 1.5 is a levelled up agentic system that can reason better, plan ahead, use digital tools such as @Google Search, interact with humans and much more. Here’s how it works 🧵 https://x.com/GoogleDeepMind/status/1971243947792925005

Humanoid robot startup 1X seeks $1 billion at $10 billion valuation
The Norwegian company, backed by OpenAI, is pursuing one of the largest funding rounds in robotics history as it develops human-like robots for home and workplace tasks. This valuation would make 1X among the most valuable private robotics companies globally, reflecting growing investor confidence that humanoid robots are moving from science fiction to commercial reality.

The Information: “Humanoid robotics startup 1X has told investors and employees it is trying to raise as much as $1 billion, according to three people who spoke to CEO Bernt Børnich. The decade-old company is aiming for a valuation of at least $10 billion, Børnich said, or https://x.com/TheHumanoidHub/status/1970493306401652883

Skild AI creates universal robot brain that adapts to any body type
The system, trained on 100,000 simulated robots over 1,000 years of virtual time, can instantly adapt when robots lose limbs or gain new attachments like stilts. This breakthrough could enable mass production of versatile robots that work across different physical forms, rather than requiring separate AI systems for each robot design.

Skild AI’s omni-bodied brain, trained on 100,000 diverse simulated robots for 1000 years, enables remarkable real-world adaptability. In-context adaptation allows the brain to discern the robot form and adapt to extreme changes like chopped limbs or walking on stilts. https://x.com/TheHumanoidHub/status/1970981739200909811

NVIDIA invests in voice AI startup ElevenLabs with CEO backing
The graphics chip giant’s investment in the synthetic voice technology company signals growing corporate interest in AI audio applications, coming alongside strengthened US-UK AI cooperation during recent diplomatic talks.

We’re excited to share that NVIDIA is investing in ElevenLabs, with support from Jensen Huang. Last week’s U.S. state visit to the UK strengthened AI ties. With our roots growing deeper in both places, this partnership and conversation were the perfect way to cap it off. https://x.com/matistanis/status/1970185470182047788

OpenAI launches sovereign AI cloud for German government with SAP partnership
OpenAI partnered with SAP and Microsoft to create a Germany-specific AI service running on certified local cloud infrastructure, targeting public sector employees. This marks OpenAI’s first sovereign cloud offering designed to meet strict government data requirements, potentially setting a template for AI deployment in other privacy-conscious nations.

OpenAI, SAP & Microsoft are launching OpenAI for Germany—a partnership to bring frontier AI to Germany’s public sector, through a sovereign, certified cloud environment. Built on SAP’s Delos Cloud and running on Microsoft Azure, this new initiative will help employees across”” / X https://x.com/OpenAINewsroom/status/1970844821624680801

SAP and OpenAI partner to launch sovereign ‘OpenAI for Germany’ | OpenAI https://openai.com/global-affairs/openai-for-germany/

Microsoft launches AI marketplace connecting publishers with Copilot users
Microsoft is creating a platform where news publishers can offer their content directly to Copilot users, potentially creating new revenue streams for media companies while giving the AI assistant access to premium, up-to-date information. This marketplace model could reshape how AI companies compensate content creators, moving beyond traditional web scraping toward direct commercial partnerships that benefit both publishers and AI platforms.

Microsoft looks to build AI marketplace for publishers with Copilot https://www.axios.com/2025/09/23/microsoft-ai-marketplace-publishers

Top diffusion model researcher Yang Song leaves OpenAI for Meta
Song invented consistency models and ranks among the world’s leading experts in image generation AI, making this a significant talent acquisition for Meta as tech giants compete for scarce AI research expertise.

Yang Song, one of the world’s top diffusion model researchers and inventor of consistency models, has left OpenAI to join Meta https://x.com/iScienceLuvr/status/1971087101203775782

Zuck just poached another Chinese researcher from OpenAI. Yang Song is a giga-brain, easily one of the strongest hires Meta has made from OpenAI so far. Some of my oai friends were shocked to see him leave. https://x.com/Yuchenj_UW/status/1971088866095603858

Tesla’s Optimus AI lead joins Meta as research scientist
Ashish Kumar, who led AI development for Tesla’s humanoid robot project, has moved to Meta AI as a research scientist. This talent migration highlights the intensifying competition for top AI robotics expertise, particularly as major tech companies race to develop practical humanoid robots for both consumer and industrial applications.

Ashish Kumar, AI Lead for Optimus, has left Tesla to join Meta AI as a Research Scientist. https://x.com/TheHumanoidHub/status/1968841695820136852

US government secures xAI’s Grok models for 42 cents per agency
The General Services Administration struck an 18-month deal giving all federal agencies access to Elon Musk’s advanced Grok AI models for just $0.42 per organization—an unprecedented bulk discount that makes frontier AI accessible across government. The agreement includes dedicated engineering support and represents the final major AI model to join GSA’s procurement suite, potentially accelerating AI adoption throughout federal operations while dramatically reducing costs compared to typical enterprise AI pricing.

GSA and xAI Partner on $0.42 per Agency Agreement to Accelerate Federal AI Adoption | GSA https://www.gsa.gov/about-us/newsroom/news-releases/gsa-xai-partner-to-accelerate-federal-ai-adoption-09252025

Meta unveils AI glasses with neural wristband control and real-time 3D scanning
Meta’s new Ray-Ban glasses feature an integrated display controlled by muscle signals from a wristband, while Quest headsets now capture photorealistic 3D environments in real-time using gaussian splatting technology. These advances position Meta ahead of Apple’s $3,500 Vision Pro by offering more accessible mixed reality tools at consumer prices. The neural interface represents a significant step toward hands-free computing beyond current voice and gesture controls.

3d gaussian splatting is fucking cool. And now you can capture real world spaces just by walking around in your meta quest headset. The real-time feedback ensures you don’t miss a spot. Apple needs to get on this asap: https://x.com/bilawalsidhu/status/1968522141273329847

Meta just unveiled AI glasses with a built-in display, controlled by a band that reads muscle signals. I sat down with Mark Zuckerberg to cover how these glasses could replace your phone, superintelligence, the metaverse, and more. 0:00 Intro 1:07 Meta’s new glasses revealed https://x.com/rowancheung/status/1968476034518630607

Meta Ray-Ban Display AI shades with on-screen display looks sick. 2% light leakage so people won’t see the display Gesture control with EMG wristband Hitting the US shelves by Sep 30 https://x.com/minchoi/status/1968744103157313799

Meta scrapping Unity to build their own game engine (Horizon Engine) is really interesting. I doubt it has as much to do with the Unity tax and more so to allow them to vertically integrate with all their own layers of ~SOTA AI starting with gaussian splatting”” / X https://x.com/nearcyan/status/1968475789021852075

The Meta Raybans thing is very cool regardless of live demo failures”” / X https://x.com/aidangomez/status/1968609969848164641

This is what meta hyperscape can do with a few minutes capture off a $400 quest 3. Some of the cleanest splats I’ve seen. Meanwhile apple releasing a dozen canned environments for the $3500 vision pro like it’s a big deal. https://x.com/bilawalsidhu/status/1970830926549766296

wow, a live demo of silently writing a message with Meta neural band on the Meta Ray-Ban Display, pretty cool https://x.com/iScienceLuvr/status/1968471538350583993

OpenAI targets Apple’s supply chain for smart speaker and wearables launch
The ChatGPT maker is poaching Apple suppliers and employees to build AI glasses, voice recorders, and pins alongside a rumored smart speaker, with products targeting late 2026 release. This marks OpenAI’s aggressive push into consumer hardware beyond software, directly challenging Apple’s ecosystem using the iPhone maker’s own manufacturing partners and talent.

OpenAI might also be developing AI glasses, a voice recorder, and a pin | The Verge https://www.theverge.com/news/781854/openai-chatgpt-hardware-rumors-smart-speaker-glasses-pin

AI video tools collapse complex visual effects into single prompts
New AI models like Runway Aleph and Seedream 4 are replacing hours-long After Effects workflows with simple text instructions, fundamentally changing how visual effects are created. This represents a shift from technical expertise-dependent pipelines to accessible, instruction-based editing that could democratize professional video production.

What used to take hours in After Effects now takes just ONE prompt. Nano Banana, Seedream 4, Wan 2.2, Runway Aleph et al are pioneering instruction-based editing — collapsing complex VFX pipelines into a single, implicit step. Here’s everything you need to know in 10 mins: https://x.com/bilawalsidhu/status/1970915228536947026

Video editors gain AI-powered multi-camera shot generation with single button
AI can now automatically create multiple camera angles from a single video source, eliminating the need for expensive multi-camera setups and potentially transforming how content creators produce professional-looking videos at a fraction of traditional costs.

Multi-camera shot generation will be a button in every video editor https://x.com/bilawalsidhu/status/1970018366124618077

Luma AI launches Ray3, first video generator with reasoning capabilities
Ray3 distinguishes itself from other AI video tools by incorporating reasoning abilities and producing high-definition HDR footage that matches studio quality. The model includes a Draft Mode for faster creative iterations and demonstrates improved physics simulation, marking a significant leap beyond basic text-to-video generation that most competitors currently offer.

This is Ray3. The world’s first reasoning video model, and the first to generate studio-grade HDR. Now with an all-new Draft Mode for rapid iteration in creative workflows, and state of the art physics and consistency. Available now for free in Dream Machine. https://x.com/LumaLabsAI/status/1968684330034606372

Runway converts existing vision AI models to enable faster parallel text generation
Runway’s new A2D technique transforms pretrained vision-language models to generate multiple text tokens simultaneously rather than one-by-one, achieving 2-3x speed improvements with adjustable quality trade-offs. This approach requires 30x less training data than building diffusion models from scratch, making advanced parallel generation accessible by adapting existing models like Qwen2.5-VL rather than training new ones.

Runway Research | Autoregressive-to-Diffusion Vision Language Models https://runwayml.com/research/autoregressive-to-diffusion-vlms

Today we’re sharing our first research work exploring diffusion for language models: Autoregressive-to-Diffusion Vision Language Models We develop a state-of-the-art diffusion vision language model, Autoregressive-to-Diffusion (A2D), by adapting an existing autoregressive vision https://x.com/runwayml/status/1970866494729781623

Google’s Veo 3 video model demonstrates unexpected reasoning abilities across visual tasks
Veo 3 can solve mazes, detect edges, edit images, and understand physics without being explicitly trained for these tasks—suggesting video models may become general-purpose vision systems like large language models did for text. The model uses a new “Chain-of-Frames” approach that breaks down visual reasoning step-by-step, marking a significant leap from previous versions in handling complex visual problems that require multi-step thinking.

Veo 3 = Zero-shot video reasoner • Trained on web-scale video, shows broad zero-shot skills (perception → physics → manipulation → reasoning) • New “Chain-of-Frames” reasoning = visual analogue of CoT • Big jump Veo2 → Veo3: edits, memory, symmetry, mazes, analogies • https://x.com/arankomatsuzaki/status/1971042970800701809

Veo is a more general reasoner than you might think. Check out this super cool paper on “”Video models are zero-shot learners and reasoners”” from my colleagues at @GoogleDeepMind. https://video-zero-shot.github.io/

New AI system creates 3D scenes from just a few photos without knowing camera positions
SPFSplatV2 achieves state-of-the-art 3D scene reconstruction from sparse, unposed images by simultaneously predicting both the 3D structure and camera positions in a single forward pass. This breakthrough eliminates the need for expensive camera calibration or pose estimation preprocessing, making 3D reconstruction accessible for real-world applications where precise camera data isn’t available. The system outperforms existing methods even when tested on completely different datasets, demonstrating robust generalization across diverse scenarios.

SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views”” TL;DR: feed-forward framework for 3DGS from sparse unposed views; predicts Gaussians + poses, enforces geometry via reprojection, SOTA novel view synthesis, even in extreme settings.”” https://ranrhuang.github.io/spfsplatv2/

Nvidia just released Lyra on Hugging Face Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation TL;DR: Feed-forward 3D and 4D scene generation from a single image/video trained with synthetic data generated by a camera-controlled video diffusion model https://x.com/_akhaliq/status/1970949464606245139