This week’s cover celebrates the milestone of reaching 40,000 hand-organized links. The cover image is inspired by the odometer in the Ferrari from Ferris Bueller’s Day Off. I had Nano Banana change the odometer on a 1961 Ferrari 250 GT to read 40,000, and I changed Ferris Bueller’s T-shirt to say AI News.

AI News #107: Week Ending October 17, 2025 with 29 Executive Summaries and Top 20 Links

October 18, 2025

About This Week’s Covers

This week’s cover celebrates the milestone of reaching 40,000 hand-organized links. The cover image is inspired by the odometer in the Ferrari from Ferris Bueller’s Day Off.

I had Nano Banana change the odometer on a 1961 Ferrari 250 GT to read 40,000, and I changed Ferris Bueller’s T-shirt to say AI News.

The weekly category covers were generated by my new Python script. The script asks me a series of questions to describe the theme, then builds prompts using Claude and generates the images using Gemini, automatically. I never actually write down the theme or see the prompts, I just answer the questions. Claude summarizes the theme from the resulting JSON, and Claude reports back the theme, after the fact. This week’s theme can be summarized as:

A 1961 Ferrari 250 GT California Spyder journeying through 53 reimagined worlds, adopting new skins and environments that reflect each category while maintaining its iconic silhouette and cinematic elegance.

The results are creative, considering I gave only the category titles and the rough theme of the Ferrari, and I’m not even using the best version of Claude.

My favorite six covers are below. I love the highway made of phone glass and the sand road with sound waves.

Mobile | a translucent highway made of glowing smartphone screens and app icons

Audio | the road surface is sculpted from elegant sound wave patterns and sine curves

This Week By The Numbers

Total Organized Headlines: 464

This Week’s Executive Summaries

This week has 464 headlines and 52 of them are contributing to the executive summaries.

This week has quite a few amazing stories. We’ll start with the top stories and then go category by category. We’ll begin with Agents and General Abilities, followed by Video Models, Science News, Data Centers and Chips, Ethics and Alignment, Locally Hosted Models, and then a really neat story about my favorite topic: Segmentation and Multimodality. But first, here are the top stories.

This Week’s Top Stories

You Can Shop From Walmart In ChatGPT
The top story this week is that Walmart has partnered with OpenAI to enable shopping within ChatGPT. I’m impressed by the way the president and CEO of Walmart framed it in his announcement.

“For many years now, eCommerce shopping experiences have consisted of a search bar and a long list of item responses. That is about to change. There is a native AI experience coming that is multi-media, personalized and contextual. We are running towards that more enjoyable and convenient future with Sparky and through partnerships including this important step with OpenAI,” said Doug McMillon, President and CEO, Walmart Inc. https://corporate.walmart.com/news/2025/10/14/walmart-partners-with-openai-to-create-ai-first-shopping-experiences

“For many years now, eCommerce shopping experiences have consisted of a search bar and a long list of item responses. That is about to change. There is a native AI experience coming that is multi-media, personalized and contextual.” -Doug McMillon, President and CEO, Walmart Inc.

Google Processing 1.3 quadrillion Tokens Per Month
The second top story is a headline coming out of Google regarding usage. Google announced that they are now processing over 1.3 quadrillion tokens per month across Google.

NVIDIA Is Building Data Centers in Space
The third top story of the week is that NVIDIA announced a project to create data centers in outer space. NVIDIA is calling it StarCloud and claims it will offer 10x lower energy costs and reduce the need for energy consumption on Earth. https://blogs.nvidia.com/blog/starcloud/

StarCloud is actually a startup that is a partner in the NVIDIA Inception Program. The Inception Program is available by application. https://www.nvidia.com/en-us/startups/

StarCloud is a startup based in Redmond, Washington. The first satellite launch is planned for November. The satellite is about the size of a small refrigerator and includes NVIDIA H100 GPUs. There is no need for cooling, since space is the ultimate heat sink. There is also essentially infinite solar power.

According to StarCloud, the energy costs in space will be 10x cheaper than land-based options, even including launch expenses. I’m not seeing any information on how the data is transmitted back to Earth, but I guess that’s kind of a core competence of satellites anyway.

OpenAI Will Design Their Own Chips
The fourth top story is an announcement by OpenAI that they are going to start designing their own chips. The announcement came as part of a press release stating that OpenAI and Broadcom will be collaborating to deploy 10 gigawatts of OpenAI-designed accelerators, to be completed by the end of 2029.

“We’re designing our own chips — taking what we’ve learned from building frontier models and bringing it directly into the hardware. Building our own hardware, in addition to our other partnerships, will help all of us meet the world’s growing demand for AI.”
https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/

Agents and Abilites

SignIn with ChatGPT
The Information is reporting that OpenAI is pitching companies to enable a “Sign in with ChatGPT” option. This is similar to how Google and Facebook allow you to sign in with their credentials. Rumor has it that if a company lets users login with the OpenAI sign-in button, they can transfer the cost of using the OpenAI engine to the customer.

I’m not sure what kind of companies we’re talking about, since I don’t have a subscription to The Information. If there’s a cost associated with using the model, perhaps it’s some kind of third-party wrapper tool. TBD, but it’s an interesting headline.
https://www.theinformation.com/articles/openais-growing-ecosystem-play https://x.com/steph_palazzolo/status/1978835849379725350

Anthropic Introduces Third-Party Skills
Anthropic introduced a new feature called Skills. Skills are essentially folders of information that can be labeled, structured, and stored as references that Claude can find and use when it needs them. A skill could include instructions, scripts, resources, or other structured information.

Up until recently, Anthropic had proprietary skills that the team built themselves—examples included things like spreadsheets or presentations, where you could give Claude a skill to work with Excel or a skill to work with PowerPoint. Now, anyone can develop their own skill and use it across any of the Anthropic properties, including apps, Claude Code, and the API.

What’s cool about Skills is that they can stack together. If Claude needs one skill, it can use it, and if it needs to combine a bunch of skills, it can stack them. It’s basically like little apps. In addition to the blog announcement, the Anthropic engineering team published a detailed deep dive into how Skills work. They’re pretty fun. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills https://claude.com/blog/skills

Claude Integration with Microsoft 365
Anthropic also announced that Claude can now connect to enterprise productivity platforms like Microsoft 365 and offer enterprise search across all connected tools. If there’s an opportunity for vertical alignment across allegiances, I can see Gemini paired with Google Docs in the Google Suite, and now Claude partnering with the Microsoft Suite. I haven’t seen any real alliances fully forming yet, other than the jagged frontier of advancements and partnerships that seem to be randomly scattered throughout the entire ecosystem.

I’m excited to start connecting my personal productivity tools into Claude and GPT. Lately, I’ve been pulled over to GPT even though I like Claude better. I just have too many stored processes and previous conversations in GPT. https://claude.com/blog/productivity-platforms https://www.youtube.com/watch?v=QTfoYDzqXn0

Microsoft Creating an AI Operating System
Mustafa Suleyman, CEO of Microsoft AI, posted that we’re one step closer to AI as an operating system, a computer you can talk to, that can see what you can see, and take action. This coincides with a blog post from Windows by Yusuf Mehdi, Executive Vice President and Consumer Chief Marketing Officer at Microsoft. Yusuf announced that the goal is to make every Windows 11 PC an AI PC.
https://x.com/mustafasuleyman/status/1978808627008847997

The first goal is to enable you to interact with your computer naturally, using either your voice or text, and have it understand you. Second, your computer will be able to see what you see and offer support. And third, your computer should be able to take action on your behalf based on voice commands.

The initial launch sounds a little bit like Alexa, where you begin by saying, “Hey Copilot,” in order to talk to your computer.

“We believe this shift to conversational input will be as transformative as the mouse and keyboard in terms of unlocking new capabilities on the PC for the broadest set of people.”
https://blogs.windows.com/windowsexperience/2025/10/16/making-every-windows-11-pc-an-ai-pc/

Google Announce Gemini Enterprise
Shifting to Google news, it’s almost uncanny timing that the same week Microsoft announced its integration with Claude and enterprise partnerships with Anthropic across the Microsoft 365 productivity toolset, Google announced Gemini Enterprise. Gemini Enterprise allows you to chat with your company’s documents, data, and apps, as well as build and deploy AI agents.

I couldn’t find a press release beyond a tweet from Sundar Pichai, so I’m assuming most of the “enterprise” functionality refers to the Google productivity suite, rather than any overlap with Microsoft or other third parties.

I’ll keep an eye out in the coming weeks.

Today we introduced Gemini Enterprise, built with our most advanced Gemini models. It allows you to chat with your company’s documents, data and apps as well as build and deploy AI agents, all grounded in your information and context.

Have a look at how it helps you build an… pic.twitter.com/4UBGIXsZBw
— Sundar Pichai (@sundarpichai) October 9, 2025

Y Combinator Announces AI Accounting Platform
Last in agent and ability news, Y Combinator announced the launch of Cranston AI, a “full-stack AI accounting platform”. The founders lean on the idea that accounting is essentially a system of rules and judgment calls that put the right information in the right place at the right time. I think as long as startups keep taking calls, asking questions, and learning from feedback, this is an incredibly ripe opportunity. We shall see. https://www.ycombinator.com/launches/OZ8-cranston-ai-a-full-stack-ai-accounting-platform

Video

OpenAI Sora Continues to Wow
Feedback keeps rolling in about OpenAI’s short video creation tool, Sora, which is able to create viral memes incredibly effectively. Ethan Mollick continues to be the king of coming up with fun ways to benchmark these tools.

One benchmark from back in May was when he tested VO3 with The Prompt—a big Broadway musical about garlic bread, complete with elaborate costumes and a Sondheim-like vibe. VO3 did a good job. Sora took it to another level.

Big progress on this important benchmark (but still weird artifacts). https://t.co/1mz8SCJisk pic.twitter.com/f9tpCi9fne
— Ethan Mollick (@emollick) October 10, 2025

Mollick also did an incredible demonstration that may be my favorite yet. One test uses the prompt: “April 1805. Napoleon is now master of Europe. Oceans are now battlefields. Ducks are now boats.”

"April 1805, Napoleon is now master of Europe. Oceans are now battlefields. Ducks are now boats" https://t.co/tE3IPlFkRs pic.twitter.com/QjwqPLo95V
— Ethan Mollick (@emollick) October 11, 2025

The second is an elaborate Regency romance where everyone is wearing a live duck for a hat, each duck is also wearing a hat, and a llama plays a flute.

Sora 2: "An elaborate Regency romance where everyone is wearing a live duck for a hat (each duck is also wearing a hat). Also a llama plays a flute." pic.twitter.com/7QJnaSIze4
— Ethan Mollick (@emollick) October 10, 2025

These are three must-see videos of the week. They do a great job of showing both the power of these video models and the creativity involved in interpreting fairly vague prompts.

Two more Sora updates from OpenAI Storyboards are now available on web to Pro users All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web

2 Sora 2 updates:

– Storyboards are now available on web to Pro users
– All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web pic.twitter.com/iINg7alWGL
— OpenAI (@OpenAI) October 16, 2025

Google Launched Veo 3.1 – With Full Audio
Bilawal Sidhu summarizes this one well:

“Veo 3.1 just dropped – Google listened and added audio everywhere it was missing. You can also insert objects into video and (soon) remove them too. Flow is quickly becoming less of a labs demo, and more of a creation tool. API available immediately; so expect to see it in your favorite AI video apps.”

In fact, he put the announcement into a Veo 3.1 clip!

Veo 3.1 just dropped – Google listened and added audio everywhere it was missing. You can also insert objects into video and (soon) remove them too.

Flow is quickly becoming less of a labs demo, and more of a creation tool. API available immediately; so expect to see it in your… pic.twitter.com/kquT2VSi1w
— Bilawal Sidhu (@bilawalsidhu) October 15, 2025

After so many serious video improvements from Google and OpenAI, it’s almost comic timing that xAI would release its video embedding and editing tool with the suggestion that all you have to do is type “add a girlfriend” to any video in the new Grok Imagine tool. Lonely, dystopian, and sophomoric to the end.

In fairness, the example they used is incredibly cute with two felt googley-eyed mini bananas wearing sweaters, drinking tea and eating muffins.

Just type “add a girlfriend” to any video on the new Grok Imagine pic.twitter.com/DDTtzPwpB1
— Elon Musk (@elonmusk) October 14, 2025

Science

Google Plans To Solve Fusion
Google DeepMind announced a research collaboration with Commonwealth Fusion Systems out of Massachusetts, with the goal of speeding up the development of fusion power using AI. DeepMind released an open-source plasma simulator called TORAX, which allows Commonwealth Fusion Systems to run millions of virtual experiments to test plans for a tokamak that CFS is branding SPARC.
https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/

I love that Google just drops “tokamak” as if we all know what that means. A tokamak is a donut-shaped magnetic confinement device that uses magnetic fields to contain super-hot plasma. It’s essentially the primary technology behind Google’s hopes for fusion.

The coolest part (actually very hot, ha ha) is… DeepMind is building AI agents that will act as pilots, learning how to control the plasma in real time. The agents will manage heat and maximize energy output while staying within operating limits (not blowing it up).

The goal is to keep ionized gas stable at temperatures over one million degrees Celsius within the limits of a fusion energy machine… in real life…after training in simulation.

It’s remarkably simple the way Google states the problem: manage the heat, maximize output, and don’t blow up the machine. It’s all happening inside simulated tokamaks running in parallel using an open source program called TORAX. Sounds pretty easy, when we put it that way.

GPT-5 and Gemini 2.5 Pro achieve gold medal performance in the International Olympiad of Astronomy and Astrophysics (IOAA).
Quick announcement speaks for itself: “GPT-5 and Gemini 2.5 Pro just achieved gold medal performance in the International Olympiad of Astronomy and Astrophysics (IOAA).” https://x.com/deedydas/status/1977029236390285608 https://arxiv.org/abs/2510.05016

Neurallink Patient Feeds Himself with Robot Arm Using Telepathy
Huge news from Neuralink (which uses a boatload of AI): “Neuralink’s patient-8, Nick Wray, fed himself for the first time since being paralyzed by ALS. He accomplished this by controlling a robotic arm purely with his thoughts, using the Telepathy implant.”

Neuralink's patient-8, Nick Wray, fed himself for the first time since being paralyzed by ALS.

He accomplished this by controlling a robotic arm purely with his thoughts, using the Telepathy implant.pic.twitter.com/wKSeQSmz05
— The Humanoid Hub (@TheHumanoidHub) October 11, 2025

I highly recommend the entire eight hour podcast with the Neuralink team on Lex Fridman last year. It’s time well-spent.

These engineers, developers, and surgeons are incredible. DJ Seo is COO & President of Neuralink. Matthew MacDougall is Head Neurosurgeon at Neuralink. Bliss Chapman is Brain Interface Software Lead at Neuralink. Noland Arbaugh is the first human to have a Neuralink device implanted in his brain. All are interviewed separately.

Datacenters, Chips, and Compute

We’ve already covered two of the two stories from this category: Space data centers and OpenAI building their own hardware.

There are two more important stories this week in data center, chips, and computing news.

New Alliance Forms to Fund Data Centers
NVIDIA, Microsoft, xAI, and BlackRock have formed a consortium to purchase a data center company called Aligned Data Centers (easy to guess what they do at least) for $40 billion. The sheer amount of money moving around feels like three-card Monte sometimes. Every week, there’s another $10 billion or $40 billion deal from tusual suspects.

The consortium includes MGX of Abu Dhabi, BlackRock, NVIDIA, Microsoft, and xAI, operating under the name Artificial Intelligence Infrastructure Partnership, or AIP for short. AIP was created by BlackRock, MGX, Microsoft, and NVIDIA in September 2024. The Kuwait Investment Authority, xAI, and Temasek (a global investment company based in Singapore with a portfolio valued at $434 billion) are additional participants. https://www.cnbc.com/2025/10/15/nvidia-microsoft-blackrock-aligned-data-centers.html

Meta Breaks Ground on 29th Data Center
Meta announced that they are breaking ground on a new AI-optimized data center in El Paso, Texas. The data center will scale to 1 gigawatt of power. For perspective, a gigawatt can power roughly 750,000 homes and is about the daily output of an average nuclear power plant.

The data center will support 1,800 construction jobs at its peak.

This will be Meta’s 29th data center!!

Meta plans to use a closed-loop, liquid-cooled system that will use zero water for “most of the year”. Meta also plans to be water-positive by 2030, and in El Paso Meta claims they will restore 200% of the water consumed by the data center back to local watersheds.

More impressively, Meta claims the data center’s electricity will be matched with 100% clean, renewable energy, and that Meta will pay for the new grid infrastructure required to connect the facility. I’ll believe it when I see it. https://about.fb.com/news/2025/10/metas-new-ai-optimized-data-center-el-paso/

Ethics and Alignment News

Blue Collar Workers Embracing AI
AI has started to appear in blue-collar work. A survey of tradespeople across North America found that over 70% of respondents have tried AI tools, and 40% actively use them. Plumbers were the most likely to say AI has helped their business grow. Cleaners were the biggest adopters of AI, while electricians reported the highest satisfaction rates.

Plumbers, in particular, use AI for tricky, unknown issues. They can take a picture of a broken water heater, for example, or write observations into a prompt, and ChatGPT becomes a useful brainstorming partner. Offices are using AI to field customer service requests and complete surveys that help with initial diagnoses, pulling up technical information within seconds. Tasks that once required flipping through multiple 60-page manuals can now be done almost instantly. https://x.com/kimmonismus/status/1976932982746497380 https://edition.cnn.com/2025/10/10/tech/ai-chatgpt-blue-collar-jobs

GPT 5 Pro Can Review PhD Math Papers
Paata Ivanisvili Professor of Mathematics at UC Irvine and former postdoc at Princeton observes: “GPT 5 Pro is extremely good in identifying serious gaps in published papers.” https://x.com/PI010101/status/1977117411603366363/photo/2 https://x.com/PI010101/status/1977117411603366363

Anthropic Posts Detailed Regulatory Advice
In September, Anthropic created the Economic Index to better understand AI’s effect on the economy. The website is comprehensive and really neat to see as an interactive resource, and it’s grown quite a bit since I first saw it. Anthropic has been keeping a close eye on it, and they posted a major update this week.

They noticed an important shift in usage: users are becoming increasingly likely to delegate full tasks to Claude and collaborate less with it. As AI models are able to work independently for longer periods of time, this trend will likely accelerate.

Anthropic attempts to address this shift, and their post explores nine categories of policy ideas covering workforce development, permit reform, fiscal policy, and social services. The nine policies are as follows:

1. Invest in upskilling through workforce training grants
2. Reform tax incentives for worker retention and retraining
3. Close corporate loopholes
4. Accelerate permits and approvals for AI infrastructure

This one is notable, since people often pick on Anthropic for being decelerationist, and this clearly aims to keep things moving.

5. Establish trade adjustment assistance for AI displacement
6. Implement taxes on compute or token generation
7. Create national sovereign wealth funds with a stake in AI
8. Adopt or modernize value-added taxes
9. Implement new revenue structures to account for AI’s growing share of the economy

All of these are worth reading, as Anthropic has put a lot of thought into fleshing out each idea individually. I’ve always admired Anthropic’s commitment to this discussion, and I take them in good faith. They also happen to have the strongest model in the world—even though they don’t constantly beat their chests about it.
https://www.anthropic.com/economic-index https://www.anthropic.com/research/economic-policy-responses

Experts Trying AI Save Time Even If It Fails Occassionally
Ethan Mollick points out that OpenAI’s GDPVal suggests “Experts should try using AI a couple times on any task, and then resort to doing it themselves (with appropriate minor AI assistance) if they can’t get AI to work for them. You still save time overall, even when AI fails on some cases.” https://x.com/emollick/status/1977874249214779558

AI Posting More Content To The Web Than Humans
A study shows that more articles are posted on the web by artificial intelligence than by humans. The quantity of AI-generated articles has surpassed the quantity of human-written articles published on the web. Interestingly, the proportion of AI-generated articles plateaued in May 2024 (I find that only suggests people and AI are getting better at co-writing and prompting).
https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans

The study finds that purely AI-generated articles do not seem to be showing up prominently in Google or ChatGPT results.

The study dives into the phenomenon that, since November 2022, companies have increasingly published content to grow traffic across marketing channels like search, social, and advertising to avoid paying humans.

A separate MIT study showed that in many cases, AI-generated content is as good as, or even better than, content written by humans. Another study found that people now struggle to distinguish AI-generated content from human-written work.

This is a really neat study by the Graphite.io team. Graphite.io does traditional search engine optimization, but also AI-recommended optimization, aka Answer Engine Optimization, or AEO. I’m surprised they can successfully pull off AEO at all. This is a tunnel worth going down if you’re into organic marketing.

OpenAI To Enable Erotica and Age-Gating
Sam Altman announced this week that he is confident OpenAI can mitigate the serious mental health issues that have emerged over the past two years. Users (often minors) have leaned too heavily on ChatGPT for companionship and therapy, with horrible outcomes.

Sam is so confident in the age-gating and mental health alignments, that OpenAI plans to release a new version of ChatGPT in December that will include age gating and as a result even allow erotica for verified adults.

This caused a bit of a stir, but the confidence around age gating is a good sign. The converse is also true: if OpenAI can verify, or reasonably assume, that someone is under 18, they can enforce age gating and put proper controls in place. It will be interesting to see how this is implemented in December.
https://x.com/sama/status/1978129344598827128
https://www.engadget.com/ai/openai-will-let-adults-use-chatgpt-for-erotica-starting-in-december-182417583.html

Locally Hosted Models

Meta Releases MobileLLM-Pro
Meta released a new model called MobileLLM-Pro on Hugging Face. This is a 1-billion-parameter foundation language model that, to my knowledge, is not multimodal. It appears to be focused on general language tasks and high-quality on-device inference on mobile phones.

Inference is essentially the ability for an LLM to put into practice what it’s been trained on…in the real world. One simple example is an email spam filter being tested with real emails for the first time.

A locally hosted model, in particular, can’t rely on the cloud for research or answers, so inference becomes especially important. The model needs to be self-sufficient on the device.

There are two versions of the model. One is a pre-trained base model with options for fine tuning, and the other is a pre-tuned version that’s a bit stronger at tool calling and question answering. It can also handle rewriting and summarization.

Currently, Meta’s MobileLLM-Pro model is outperforming Gemma 3.1B (Google) and LLaMA 3.2 1B (Meta) by roughly 5–7%. This is a very important, if not especially flashy, area to follow, particularly when we see Meta working on things like on-device segmentation models.

Once these various models can start talking to each other, things are going to get crazy very quickly.

https://ai.meta.com/sam2/ https://x.com/_akhaliq/status/1978916251456925757 https://huggingface.co/facebook/MobileLLM-Pro

Google and Yale Train Local Model That Finds Cancer Discoveries
“Google and Yale scientists have trained an LLM that has generated a novel hypothesis about cancer cellular behavior. This prediction was confirmed multiple times in vitro.”

“The model that generated this prediction is a 27B-parameter LLM based on the Google Gemma open source models, and trained on a corpus comprising >1B tokens of transcriptomic data, biological text, and metadata. Quite remarkable that a small (just 27B) LLM trained on specialized data is able to make novel scientific discoveries.”

“We found out today that an LLM that fits on a high-end consumer GPU, when trained on specific biological data, can discover a novel method to make cancer tumors more responsive to immunotherapy.

Confirmed novel discovery (not present in existing literature). Experimentally validated in living cells.” https://x.com/deredleritt3r/status/1978561622932164905 https://blog.google/technology/ai/google-gemma-ai-cancer-therapy-discovery/

Fine Tune Qwen on A MacBook
If you’re a bit more technical, you may enjoy this post about fine-tuning Qwen 3-0.6B on a MacBook. https://x.com/ModelScope2022/status/1977706364563865805

Locally Hosted World Models!
Spatial Intelligence company World Labs specializes in building what’s called world models. These models can perceive, generate, reason, and interact with 3D worlds, whether they exist in real life or virtually. The implications are pretty much limited only by your imagination, whether that’s embodied robots navigating a house, remote teleoperation, or immersive simulations through headsets.

Diffusing frames rather than rendering them individually is a newer technique, and World Labs introduced a model called RTFM, short for real-time frame model, that does exactly this. It refuses to read instructions, sadly (dad joke, IYKYK). RTFM generates video frames in real time as you interact with it. What’s really amazing is that it’s powered by only a single NVIDIA H100 GPU and can render persistent, 3D-consistent worlds. This has been done before with Google’s Genie models (among others perhaps), but to my knowledge, World Labs is the first to fit this capability onto a single H100. https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

You may not have heard much about World Labs yet. However, their co-founder and CEO, Fei-Fei Li, is a professor at Stanford and the director of the Human-Centered AI program at Stanford. She is one of the most famous and influential AI researchers in the world and also runs the Stanford Vision and Learning Lab.

https://x.com/theworldlabs/status/1978839171058815380 https://www.worldlabs.ai/blog/rtfm https://x.com/jcjohnss/status/1978842517605843391 https://x.com/drfeifei https://svl.stanford.edu/ https://x.com/StanfordHAI https://hai.stanford.edu/

Multimodal News

ByteDance Releases Open Source Segmentation Model
Last this week is a headline from the world of multimodality, which is quite possibly my favorite sub-genre of artificial intelligence. ByteDance, the parent company of TikTok, released Sa2VA on Hugging Face. This is a multimodal language model that combines Segment Anything 2 with the LLaVA model.

LLaVA is a combined multimodal vision encoder and language model designed to understand visual questions and provide answers about videos and imagery.

This pairing with Sa2VA + LLaVA makes a lot of sense: segmentation can identify and track objects, and the large language vision model can further explain what’s happening within those segmentations.

I saved this one for last because it’s not necessarily the most exciting topic for laypeople. But open-source segmentation, vision, and image understanding are going to transform everything—from how agents operate computers and browse the internet, to embodied robots, home security systems, equipment monitoring, traffic cameras, livestock management, and inventory tracking.

It’s a powerful example of the sum being greater than the parts. These models can be hard to wrap your head around, but I highly recommend at least skimming the material and googling what you don’t understand. It will be rewarding down the road.

https://x.com/HuggingPapers/status/1978745567258829153 https://huggingface.co/ByteDance/Sa2VA-InternVL3-14B https://medium.com/@ud.uddeshya16/introduction-to-llava-a-multimodal-ai-model-2a2fa530ace4 https://ollama.com/library/llava https://www.microsoft.com/en-us/research/project/llava-large-language-and-vision-assistant/

This Week’s Humanities Reading

In the spirit of this week’s Ferris Bueller’s Day Off theme, the Humanities reading comes from Ferris himself:

“Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it.”

Full Executive Summaries with Links, Generated by Claude Sonnet 4

Walmart partners with OpenAI for direct ChatGPT shopping purchases
Walmart struck a deal allowing customers to buy items directly through ChatGPT’s new Instant Checkout feature, marking a significant shift as traditional retailers embrace AI-powered commerce. This move puts Walmart ahead of Amazon, which reportedly blocks ChatGPT agents from accessing its site, reversing the typical dynamic where Amazon leads e-commerce innovation. Walmart’s stock jumped nearly 5% on the news, hitting a 52-week high as investors recognized the potential of AI-native shopping experiences.

Walmart is moving very fast in AI. Amazon still seems to block ChatGPT agents from even visiting its site. Interesting reversal in agentic commerce, a lesson learned from e-commerce where Amazon moved fast?”” / X https://x.com/emollick/status/1978130496207888717

ChatGPT instant checkout for Walmart:”” / X https://x.com/gdb/status/1978123494870196228

Walmart teams up with OpenAI to allow purchases in ChatGPT https://www.cnbc.com/2025/10/14/walmart-openai-chatgpt-shopping.html

welcome @Walmart to instant checkout 🤝”” / X https://x.com/bradlightcap/status/1978116720171643127

OpenAI pitches “sign in with ChatGPT” to shift AI costs to users
OpenAI is offering companies a login system similar to Google or Facebook sign-in, but with a twist: businesses can pass their AI model usage costs directly to customers who use the service. This represents a significant shift from the current model where companies absorb AI expenses, potentially making advanced AI features more accessible to smaller businesses while creating a new revenue stream for OpenAI.

OpenAI is pitching companies on adding a “”sign in with ChatGPT”” option to their sites, similar to how you might sign in with Google or Facebook. Part of the pitch: Companies that agree can transfer the costs of using OpenAI’s models to their customers. https://x.com/steph_palazzolo/status/1978835849379725350

Anthropic launches Agent Skills to make Claude specialized on demand
Anthropic introduced Agent Skills, a system that packages domain expertise into reusable folders containing instructions, scripts, and resources that Claude loads only when relevant to specific tasks. Unlike traditional AI improvements, Skills use progressive disclosure—loading minimal metadata first, then full details only if needed—making them token-efficient and scalable. Early testing shows Skills produce more precise outputs by providing structured, specialized context rather than relying on general AI capabilities alone.

Equipping agents for the real world with Agent Skills \ Anthropic https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills

Claude Skills are awesome, maybe a bigger deal than MCP https://simonwillison.net/2025/Oct/16/claude-skills/

Claude Skills: Customize AI for your workflows \ Anthropic https://www.anthropic.com/news/skills

Today we’re introducing Skills in claude dot ai, Claude Code, and the API. Skills let you package specialized knowledge into reusable capabilities that Claude loads on demand as agents tackle more complex tasks. Here’s how they work and why they matter for the future of agents: https://x.com/alexalbert__/status/1978877498411880550

I am not going to lie. I see a lot of potential in the Skills feature that Anthropic just dropped! Just tested with Claude Code. It leads to sharper and precise outputs. It’s structured context engineering to power CC with specialized capabilities, leveraging the filesystem. https://x.com/omarsar0/status/1978919087137804567

Claude now connects directly to Microsoft 365 for enterprise search
Anthropic launched Claude integrations with SharePoint, Outlook, and Teams that let the AI assistant search across company documents, emails, and chat histories to answer questions using organizational knowledge rather than just individual files.

Claude and your productivity platforms \ Anthropic https://www.anthropic.com/news/productivity-platforms

Microsoft makes every Windows 11 PC an AI-powered computer
Microsoft is integrating Copilot AI directly into Windows 11 with voice commands, screen analysis, and automated actions, transforming how people interact with their computers. The update introduces “Hey Copilot” voice activation, Copilot Vision that can see and analyze screen content, and experimental features that let AI take actions on files and applications with user permission. This represents a fundamental shift from traditional mouse-and-keyboard computing to conversational AI interaction, with Microsoft reporting that voice users engage twice as much as text users.

Today, we’re one step closer to AI as an operating system. A computer you can talk to, that can see what you see, and take action – all with your permission, all more intuitive than ever. Vision now GA globally + more on today’s @Windows blog: https://x.com/mustafasuleyman/status/1978808627008847997

Making every Windows 11 PC an AI PC | Windows Experience Blog https://blogs.windows.com/windowsexperience/2025/10/16/making-every-windows-11-pc-an-ai-pc/

Google launches Gemini Enterprise for workplace AI integration
Google’s new enterprise AI service lets employees chat with company documents and build custom AI agents using their organization’s data, marking a shift from general-purpose AI tools to business-specific applications. This represents a significant move toward AI systems that understand company context rather than just providing generic responses.

Today we introduced Gemini Enterprise, built with our most advanced Gemini models. It allows you to chat with your company’s documents, data and apps as well as build and deploy AI agents, all grounded in your information and context. Have a look at how it helps you build an https://x.com/sundarpichai/status/1976338416611578298

Google processes over 1.3 quadrillion AI tokens monthly across services
Google revealed the massive scale of its AI operations, processing more than 1.3 quadrillion tokens per month across its platforms. This figure demonstrates how deeply AI has penetrated Google’s ecosystem, from search to cloud services, representing one of the largest disclosed AI processing volumes by any company. The disclosure signals both Google’s current AI dominance and hints at significant expansion plans ahead.

Over 1.3 quadrillion tokens a month across Google, so much progress : ) so much more to go! https://x.com/OfficialLoganK/status/1976359039581012127

AI startup files complete corporate tax returns directly with IRS
Cranston AI’s software agents gather financial data across businesses and prepare full tax filings for human review before submission, marking a shift from AI assistance to AI execution in professional services that typically require certified expertise.

Cranston AI (@cranston_ai) does your company’s bookkeeping & taxes with AI. Their agents pull in context from across the business and, after human review, file a full corporate tax return with the IRS. https://x.com/ycombinator/status/1975591950255358411

OpenAI’s Sora 2 ties for first place in video generation rankings
Sora 2 Pro matched Google’s Veo 3 for top performance on Video Arena’s leaderboard, while regular Sora 2 claimed third place among text-to-video AI models. The milestone matters because it shows OpenAI has caught up to Google in video AI quality, with 70% of Sora’s 2 million weekly users actively creating content rather than just experimenting. Both versions now support longer video generation up to 15-25 seconds.

Big progress on this important benchmark (but still weird artifacts). https://x.com/emollick/status/1976702663330038205

two big sora updates for creators: in the sora app: you can now generate 15sec natively. tap the model selector at the top of the screen to change duration. (it will initially use two of your gens/day, we will make this clearer in the UI soon). on web: we have an awesome new”” / X https://x.com/billpeeb/status/1978662020947087869

2 Sora 2 updates: – Storyboards are now available on web to Pro users – All users can now generate videos up to 15 seconds on app and web, Pro users up to 25 seconds on web https://x.com/OpenAI/status/1978661828419822066

🚨 🎬 Video Arena Disrupted! @Openai’s Sora 2 and Sora 2 Pro have landed on the Text-to-Video leaderboard. 🏆 Sora 2 Pro is the first to tie rank with Veo 3 variants for #1. 🥉 Sora 2 comes in at #3, pushing the non-audio variants of Veo 3 into 5th! Video models with audio https://x.com/arena/status/1978149396996051007

OpenAI’s Head of Sora @billpeeb says a stunning 70% of Sora’s nearly 2 million weekly active users are creating content. https://x.com/tbpn/status/1976759087456305191

April 1805, Napoleon is now master of Europe. Oceans are now battlefields. Ducks are now boats”” https://x.com/emollick/status/1976855737289916922

Google releases Veo 3.1 video generator with improved audio and visuals
Google’s Veo 3.1 generates 8-second, 720p videos with synchronized audio through Google Vids, marking a significant upgrade over its predecessor with better prompt accuracy and visual detail. Early tests show the model produces more creative and nuanced outputs compared to Veo 3, addressing previous issues like object proportion problems while directly competing with OpenAI’s Sora 2. The staged rollout through Google Vids and Vertex AI suggests broader availability is imminent, positioning Google to challenge OpenAI’s dominance in AI video generation.

Veo 3.1 just dropped – Google listened and added audio everywhere it was missing. You can also insert objects into video and (soon) remove them too. Flow is quickly becoming less of a labs demo, and more of a creation tool. API available immediately; so expect to see it in your https://x.com/bilawalsidhu/status/1978497357760311500

Exclusive: First real samples of Veo 3.1 generated videos https://www.testingcatalog.com/first-real-samples-of-veo-3-1-generated-videos/#google_vignette

Grok adds AI girlfriend feature to any video with simple text command
Elon Musk’s Grok AI now lets users add virtual girlfriends to videos by typing “add a girlfriend,” marking a shift toward AI companionship features in mainstream platforms. This represents the first major social media integration of AI romantic partners, potentially normalizing digital relationships and raising questions about social interaction patterns.

Just type “add a girlfriend” to any video on the new Grok Imagine https://x.com/elonmusk/status/1977982448861381081

AI models win gold medals in international science olympiads
GPT-5 and Gemini 2.5 Pro achieved gold-level performance in the International Olympiad of Astronomy and Astrophysics, joining recent victories in math and computer science competitions. This marks a dramatic leap from AI systems that struggled with basic math just a year ago to now matching the world’s top high school students in advanced STEM reasoning. The breakthrough suggests AI has crossed a threshold in complex problem-solving that could reshape scientific research and education.

GPT-5 and Gemini 2.5 Pro just achieved gold medal performance in the International Olympiad of Astronomy and Astrophysics (IOAA). AI is now world class at cutting edge physics. https://x.com/deedydas/status/1977029236390285608

I don’t think people have updated enough on the capability gain in LLMs, which (despite being bad at math a year ago) now dominate hard STEM contests: The International Math Olympiad, the International Olympiad on Astronomy & Astrophysics, International Informatics Olympiad… https://x.com/emollick/status/1977460160197956089

Google DeepMind partners with fusion company to accelerate clean energy
The AI lab is collaborating with Commonwealth Fusion Systems to use artificial intelligence to speed up development of nuclear fusion power, potentially helping solve one of humanity’s biggest energy challenges by making the complex physics calculations needed for fusion more efficient.

We’re announcing a research collaboration with @CFS_energy, one of the world’s leading nuclear fusion companies. Together, we’re helping speed up the development of clean, safe, limitless fusion power with AI. ⚛️ https://x.com/GoogleDeepMind/status/1978808994811588666

Google DeepMind is bringing AI to the next generation of fusion energy – Google DeepMind https://deepmind.google/blog/bringing-ai-to-the-next-generation-of-fusion-energy/

Paralyzed ALS patient feeds himself using thought-controlled robotic arm
Nick Wray became the first person to use Neuralink’s brain implant to control a robotic arm for self-feeding, marking a breakthrough in restoring independence for paralyzed patients. This represents a significant advance beyond previous brain-computer interfaces that only controlled computer cursors, demonstrating direct neural control of physical robotics for daily activities.

Neuralink’s patient-8, Nick Wray, fed himself for the first time since being paralyzed by ALS. He accomplished this by controlling a robotic arm purely with his thoughts, using the Telepathy implant. https://x.com/TheHumanoidHub/status/1976891521363591588

Neuralink on X: “ALS took Nick’s arm mobility. Now, he can control a robotic arm with his Neuralink device to feed himself. “Life with my BCI has been and continues to be so surreal and so rewarding. Can’t wait to see what comes next!” https://t.co/jInvprOyAr” / X https://x.com/neuralink/status/1976803020190236915

Starcloud plans to launch data centers into orbit by 2025
The startup aims to solve Earth-based computing limitations by placing servers in space, where unlimited solar power and natural cooling could enable more efficient AI processing. This represents a novel approach to scaling compute infrastructure beyond terrestrial constraints, though the technical and economic viability remains unproven.

How Starcloud Is Bringing Data Centers to Outer Space | NVIDIA Blog https://blogs.nvidia.com/blog/starcloud/

OpenAI finalizes custom chip design to reduce Nvidia dependence
OpenAI will complete its first in-house AI chip design within months and send it to Taiwan Semiconductor for manufacturing using advanced 3-nanometer technology, with mass production targeted for 2026. This marks a significant shift as the ChatGPT maker joins tech giants like Microsoft and Meta in developing custom silicon to reduce reliance on Nvidia’s dominant 80% market share. The chip, designed by a 40-person team led by former Google executive Richard Ho, will initially focus on running AI models rather than training them, giving OpenAI more negotiating leverage with suppliers.

Announcing partnership with @Broadcom to build an OpenAI chip. This deal is on top of the @nvidia and @AMD ones we’ve announced over the past few weeks, and will allow us to customize performance for specific workloads. The world needs more compute.”” / X https://x.com/gdb/status/1977739645040378267

Exclusive: OpenAI set to finalize first custom chip design this year | Reuters https://www.reuters.com/technology/openai-set-finalize-first-custom-chip-design-this-year-2025-02-10/

We’re partnering with Broadcom to deploy 10GW of chips designed by OpenAI. Building our own hardware, in addition to our other partnerships, will help all of us meet the world’s growing demand for AI. https://x.com/OpenAINewsroom/status/1977724753705132314

We’re designing our own chips — taking what we’ve learned from building frontier models and bringing it directly into the hardware. Building our own hardware, in addition to our other partnerships, will help all of us meet the world’s growing demand for AI. In Episode 8 of the https://x.com/OpenAI/status/1977794196955374000

Really happy to be announcing the chips we’ve been cooking the past 18 months! OpenAI kicked off the reasoning wave with o1, but months before that we’d already started designing a chip tuned precisely for reasoning inference of OpenAI models. In January 2024, I joined OpenAI as”” / X https://x.com/itsclivetime/status/1977772728850817263

OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators | OpenAI https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/

Tech giants buy massive data center company for $40 billion
A consortium led by Nvidia, Microsoft, BlackRock and Elon Musk’s xAI agreed to purchase Aligned Data Centers for $40 billion, marking the largest global data center deal ever. The acquisition reflects the urgent need for AI infrastructure as companies race to build computing capacity for training and running AI models. Aligned operates 50 data center campuses with over 5 gigawatts of capacity across the Americas, providing the massive facilities needed to house AI hardware.

Nvidia, Microsoft, BlackRock part of $40B Aligned Data Centers deal https://www.cnbc.com/2025/10/15/nvidia-microsoft-blackrock-aligned-data-centers.html

Meta breaks ground on massive 1-gigawatt AI data center in El Paso
The $1.5 billion facility represents Meta’s 29th data center and largest AI infrastructure investment, designed specifically to handle the massive computing demands of training and running advanced AI models. The project will create 1,800 construction jobs and 100 permanent positions while using 100% renewable energy and innovative water-free cooling systems. This scale of dedicated AI infrastructure signals how tech giants are racing to build the computing power needed for next-generation artificial intelligence capabilities.

Breaking Ground on Our New AI-Optimized Data Center in El Paso https://about.fb.com/news/2025/10/metas-new-ai-optimized-data-center-el-paso/

Blue-collar workers embrace AI more than expected, survey finds
A new survey reveals that plumbers, cleaners, and electricians are leading AI adoption in unexpected ways, with plumbers most likely to credit AI for business growth and cleaners showing the highest usage rates. This challenges assumptions that AI primarily benefits white-collar knowledge workers, suggesting the technology’s practical applications extend far beyond office environments into hands-on trades.

1/ That came as a surprise to me: “”AI’s integration and impact varies by industry, according to the survey; Plumbers were the most likely to say AI has helped their business grow; cleaners were the “biggest adopters of AI”; while electricians had “the highest satisfaction rates” https://x.com/kimmonismus/status/1976932982746497380 https://edition.cnn.com/2025/10/10/tech/ai-chatgpt-blue-collar-jobs

GPT-5 Pro demonstrates advanced scientific paper review capabilities
OpenAI’s latest model can identify significant flaws and gaps in published research papers with high accuracy. This represents a major leap in AI’s ability to assist with scientific peer review and quality control, potentially transforming how academic research is evaluated and validated across disciplines.

GPT 5 Pro is extremely good in identifying serious gaps in published papers. https://x.com/PI010101/status/1977117411603366363

Anthropic explores nine policy ideas to manage AI’s economic disruption
As AI systems increasingly handle full tasks independently rather than collaborating with humans, Anthropic has outlined policy responses ranging from workforce retraining grants for modest disruption to sovereign wealth funds and new tax structures for scenarios with dramatic job losses. The company’s Economic Index shows users are delegating complete tasks to Claude more frequently, signaling a shift toward AI autonomy that could accelerate workforce displacement. These proposals span three scenarios based on disruption severity, with some policies like compute taxes directly impacting Anthropic’s own revenue.

Preparing for AI’s economic impact: exploring policy responses \ Anthropic https://www.anthropic.com/research/economic-policy-responses

AI tools save experts time even when they frequently fail
A new study called GDPEval found that experts should attempt AI assistance 2-3 times on any task before reverting to manual work with minor AI support. This approach delivers net time savings despite AI’s inconsistent performance, suggesting the optimal strategy isn’t perfect AI reliability but strategic trial-and-error usage.

This matches what the GDPEval paper found. Experts should try using AI a couple times on any task, and then resort to doing it themselves (with appropriate minor AI assistance) if they can’t get AI to work for them. You still save time overall, even when AI fails on some cases. https://x.com/emollick/status/1977874249214779558

AI-generated articles now outnumber human-written ones on the web
A study analyzing 65,000 web articles found that AI-generated content surpassed human-written articles in November 2024, growing from nearly zero to over 50% since ChatGPT’s launch in late 2022. However, this growth has plateaued since May 2024, likely because AI content performs poorly in search results. The researchers used Surfer’s AI detection tool with a 4.2% false positive rate to classify content, revealing a dramatic shift in web publishing patterns within just two years.

More Articles Are Now Created by AI Than Humans https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans

OpenAI will allow adults to use ChatGPT for erotica this December
The company is implementing age verification and parental controls to separate adult and child users, enabling more permissive content policies for verified adults while maintaining safety restrictions for minors. This marks a significant shift from ChatGPT’s historically restrictive approach, which OpenAI says was designed to address mental health concerns but made the system less useful for many users. The change reflects OpenAI’s new “treat adult users like adults” principle and comes alongside broader policy updates allowing mature applications on the platform.

Ok this tweet about upcoming changes to ChatGPT blew up on the erotica point much more than I thought it was going to! It was meant to be just one example of us allowing more user freedom for adults. Here is an effort to better communicate it: As we have said earlier, we are”” / X https://x.com/sama/status/1978539332215681076

OpenAI will let adults use ChatGPT for erotica starting in December https://www.engadget.com/ai/openai-will-let-adults-use-chatgpt-for-erotica-starting-in-december-182417583.html

Sam Altman on X: “We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realize this made it less useful/enjoyable to many users who had no mental health problems, but given the seriousness of the issue we wanted to get this right. Now that we have” / X https://x.com/sama/status/1978129344598827128

Meta releases 1B parameter language model for smartphones
Meta’s MobileLLM-Pro brings AI capabilities directly to mobile devices without requiring cloud connectivity, marking a significant shift toward privacy-focused AI that can run locally on smartphones. This addresses growing concerns about data privacy and internet dependency while making AI more accessible in areas with poor connectivity.

Meta just dropped MobileLLM-Pro on Hugging Face a 1B foundational language model in the MobileLLM series, designed to deliver high-quality, efficient on-device inference across a wide range of general language modeling tasks two variants of the model: A pre-trained base model https://x.com/_akhaliq/status/1978916251456925757

Google’s AI model discovers new cancer immunotherapy drug combination
Google’s C2S-Scale model, built on consumer-grade hardware, identified that combining silmitasertib with interferon increases tumor visibility to immune systems by 50% in lab tests. This represents the first time an AI model has generated and experimentally validated a completely novel cancer therapy hypothesis not found in existing medical literature. The discovery demonstrates AI’s potential to accelerate drug discovery by finding unexpected drug combinations that work synergistically in specific biological contexts.

Just to recap: We found out today that an LLM that fits on a high-end consumer GPU, when trained on specific biological data, can discover a novel method to make cancer tumors more responsive to immunotherapy. Confirmed novel discovery (not present in existing literature).”” / X https://x.com/deredleritt3r/status/1978561622932164905

Google’s Gemma AI model helps discover new potential cancer therapy pathway https://blog.google/technology/ai/google-gemma-ai-cancer-therapy-discovery/

Apple’s MLX framework enables 400-token-per-second AI model training on MacBooks
A researcher fine-tuned a 600-million parameter language model in under two minutes on consumer Apple hardware, demonstrating that serious AI development no longer requires expensive specialized equipment. This breakthrough could democratize AI experimentation by making powerful model training accessible to developers using standard laptops.

🤯 400 Token/S on a MacBook? Yes, you read that right! Shaohong Chen just fine-tuned the Qwen3-0.6B LLM in under 2 minutes using Apple’s MLX framework. This is how you turn your MacBook into a serious LLM development rig. A step-by-step guide and performance metrics inside! 🧵 https://x.com/ModelScope2022/status/1977706364563865805

World Labs launches real-time 3D world generator on single GPU
World Labs released RTFM, a generative AI model that creates interactive 3D worlds in real-time using just one H100 GPU chip. Unlike traditional 3D graphics that build explicit geometric models, RTFM learns to render complex visual effects like reflections and shadows directly from video data. The system maintains persistent worlds that users can explore indefinitely without forgetting previous areas, addressing a key limitation of earlier generative world models.

Very excited to share @theworldlabs ‘s latest research work RTFM!! It’s a real-time, persistent, and 3D consistent generative World Model running on *a single* H100 GPU! Blog and live demo are available below! 🤩”” / X https://x.com/drfeifei/status/1978840835341914164

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today! https://x.com/theworldlabs/status/1978839171058815380

RTFM: A Real-Time Frame Model | World Labs https://www.worldlabs.ai/blog/rtfm

We are sharing a research preview of our latest model from @theworldlabs! RTFM is an autoregressive diffusion transformer trained on large-scale video data. It generates video frames in real-time without building an explicit 3D model of the world. Try the demo today!”” / X https://x.com/jcjohnss/status/1978842517605843391

Researchers create first unified AI model for precise object identification in images and videos
Sa2VA combines two leading AI systems to identify and outline specific objects in visual content based on text descriptions, achieving breakthrough performance across multiple tasks. This matters because current AI models typically handle either images or videos separately, while Sa2VA works seamlessly across both formats with minimal training. The team validated their approach with a new dataset of 72,000 labeled video objects and demonstrated superior results in complex real-world scenarios.

[2501.04001] Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos https://arxiv.org/abs/2501.04001

ByteDance just released Sa2VA on Hugging Face. This MLLM marries SAM2 with LLaVA for dense grounded understanding of images & videos, offering SOTA performance in segmentation, grounding, and QA. https://x.com/HuggingPapers/status/1978745567258829153

There’s Only One Lonely AI Visual: Week Ending October 17, 2025

Sora 2: “”An elaborate Regency romance where everyone is wearing a live duck for a hat (each duck is also wearing a hat). Also a llama plays a flute.”” https://x.com/emollick/status/1976731641193419139