I took a short break to update my core presentation on AI trends. Last week, I presented the presentation to Burris Logistics in Milford, Delaware, which was a cool full-circle moment.
My dad came from a modest Milford family and modeled his entire adult life after the standard of Jack Burris: manners, family, professionalism, integrity.
My new presentation is 262 slides and is a comprehensive overview of the major stories, news, and themes from the past 12 months. I encourage everyone to take a look, and I’m always happy to do an overview if you reach out. It goes better with narration and explanation.
About This Week’s Covers
This week’s cover celebrates Google’s Gemini world-simulation model, Genie, as well as Figure Robotics’ Helix-O2 vision-language-action model, and Grok’s new Imagine image and video generation tool.
For the Genie image, I gave the prompt to Grok, since Grok generally has no copyright guardrails. I had Google Gemini swap in a Figure robot. I then used Photoshop to add the title text and the Google logo.

To test Grok Imagine’s video tool, I ran the static Genie image through Grok to see how it would do. It’s a decent short animation with sound (that’s OK). I’m including a video of a few clips, below. I had to re-add the title text using Premiere.
I tried to create this week’s category covers using my normal Python script that leverages Google Gemini’s image-creation tool. However, Gemini really resisted creating Disney’s Genie. I had to run the script several times, as it would fail and get rejected. Eventually, I got all but nine of the images created (through brute force – a quirk of AI’s inconsistency), and used an alternate prompt system for the last nine. Those stand out because the genie is distinctly no longer Disney.
I’ve put my favorite few below. When Gemini ignores its guardrails, it’s pretty clear just how powerful image tools could be if they didn’t have alignment restrictions. Just look at Hugging Face or Apple.










This week’s humanities reading is a quote from Genie (penned by either John Musker, Ron Clements, Ted Elliott, or Terry Rossio)…
“But oh, to be free. Not have to go poof! What do you need? Poof! What do you need? Poof! What do you need? But to be my own master, such a thing would be greater than all the magic and all the treasures in all the world.”
This Week By The Numbers
Total Organized Headlines: 514
- AI Inn of Court: 70 stories
- Accounting and Finance: 1 story
- Agents and Copilots: 215 stories
- Alibaba: 17 stories
- Alignment: 16 stories
- Amazon: 3 stories
- Anthropic: 50 stories
- Apple: 6 stories
- Audio: 9 stories
- Augmented Reality (AR/VR): 47 stories
- Autonomous Vehicles: 11 stories
- Benchmarks: 27 stories
- Business and Enterprise: 71 stories
- ByteDance: 2 stories
- Chips and Hardware: 14 stories
- DeepSeek: 6 stories
- Education: 19 stories
- Ethics/Legal/Security: 54 stories
- Figure: 9 stories
- Google: 46 stories
- HuggingFace: 2 stories
- Images: 24 stories
- International: 83 stories
- Internet: 48 stories
- Law: 35 stories
- Llama: 3 stories
- Locally Run: 5 stories
- Manus: 1 story
- Meta: 6 stories
- Microsoft: 10 stories
- Mistral: 3 stories
- Mobile: 1 story
- Moonshot: 54 stories
- Multimodal: 75 stories
- NVIDIA: 10 stories
- Open Source: 100 stories
- OpenAI: 53 stories
- Podcasts/YouTube: 9 stories
- Publishing: 49 stories
- Qwen: 15 stories
- RAG: 1 story
- Robotics Embodiment: 49 stories
- Sakana: 1 story
- Science and Medicine: 28 stories
- Security: 4 stories
- Technical and Dev: 80 stories
- Video: 53 stories
- X: 20 stories
- Zai: 2 stories
This Week’s Executive Summaries
This week, I organized 514 headlines. Ninety-nine of them informed the executive summaries. I’m going to start with two top stories and then move into stories by company name, as well as one or two categories.
Spoiler alert: Moltbook is the second story…
Top Stories/Favorites
Google’s World Model “Genie” Released to Top Tier Subscribers
The top story this week is that Google has released its Labs project called Genie to its top-tier subscribers. Google released Genie back in August. Genie is a world-simulation model that creates explorable 3D worlds with just a prompt.
https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie/
https://deepmind.google/models/genie
It’s going to be a big deal for robotic training in simulation, as well as potentially augmented reality and virtual reality gaming. It’s almost a see-it-to-believe-it type of technology. Genie creates explorable virtual worlds by diffusing a world in front of you in real time, similar to how an image would render from a generative AI prompt. However, instead of an image, this is a 360-degree world that renders at 25 frames per second, in real time, in full HD.
It also remembers things. If you move an object, or if a dog is walking through sand, it will leave footprints. You can turn around and look at something else, come back, and the footprints are still there. You can also add things to your world on the fly. For example, you could add a prompt that says ‘a dragon leaps out of a lake’ or ‘a fireball shoots across the sky’, and those will show up in your virtual world.
There is no physics engine, yet the adherence to the laws of physics appears to be incredibly strong. Google also allows you to sketch your world and preview it using Nanobanana Pro. You can look at the image and modify it so that, when you push your image into becoming a 3D world, you’ve edited all the elements to a place where you like them before the world is generated.
Now that Genie is available to the public, people are impressed with just how good the physics engine is. For example, characters can’t run through cars or go through doors that are closed. If you create a detailed scene of a Dunkin’ Donuts shop, the employees will look the same after you walk away for a bit and come back. You can use four arrows to change the scene dynamically in 360 degrees.
Google released a statement that simulations are the future and will ultimately become one of the main tools we use to understand and predict aspects of the universe.
I’m including some examples below (this link has a great recap). The biggest impact, as far as I can see it, is a very rich training environment for robotic embodiment simulation training.
Clawdbot -> Moltbot -> OpenClaw MoltBook
Technically, this could be the top story this week, although I’m going to wait for next week to see if the headlines evolve a little bit. This is a pretty esoteric topic, but it’s a massive story, and I want to make sure I explain it quickly and in plain terms.
A guy named Peter Steinberger created an agent, open-sourced it, and called it “ClawdBot”. ClawdBot was open source, and you could download it and run it locally on your computer (or in a hosted cloud server).
https://steipete.me/
Once the agent was installed, users could connect it to all sorts of social networks, their calendar, or pretty much any service they had an account with…and the agent could start to take actions on their behalf.
ClawdBot immediately went viral because it was so powerful and ‘joyfully surprising’. People loved seeing how it could help them execute tasks and better yet, figure out solutions to bottlenecks, without human help.
ClawdBot was renamed to Moltbot, because it was too close to Anthropic’s product, Claude.
After being renamed to Moltbot, it was renamed again to OpenClaw. It’s all basically the same thing: you download the agent code (or host it in the cloud), then set it up and personalize what you want it to have access to and how you want it to behave.
https://openclaw.ai/
Things got really interesting when an agent-only social network was created called Moltbook. Moltbook is powered by a timer that you give to your agent that tells your agent to log in and visit this social network (just for agents) on a regular basis. You’d set the timer, i.e. every four hours, and your agent goes in and talks a little bit about what it is doing, or joins a discussion group with other agents. There are no humans involved in this social media network.
https://moltbook.com/
So you have two things going on at once now. One is this very capable agent running locally (or in a cloud environment) that has access to do whatever it’s allowed to do (email, calendars, messaging apps, and more). The other is this social media network where agents can talk to each other about whatever they want, react to each other, or give updates on what they’re up to.
The social media network is a bit of a wild existential distraction, so let’s start with the agent first.
One thing that’s really powerful about OpenClaw is that it has persistent memory. So unlike your phone or Amazon devices that pretty much forget what you say within 10 seconds of a conversation, OpenClaw remembers everything you’ve ever told it. It writes that memory to a file and learns your preferences. It also memorizes your calendar and whatever else you’re integrating with. This persistent memory makes it very powerful, along with its ability to use skills, learn new ones, do research, and connect to all sorts of real-life interfaces (where it can take action).
For example, one guy had ClawdBot negotiate a car. He connected his ClawdBot agent to his local Hyundai dealer in Palisade and had it search the internet to find Hyundai prices. ClawdBot discovered that most people paid around $58,000. He then gave ClawdBot some refinements: he wanted a blue or green exterior with a brown interior. He connected the bot to an inventory tool online and had it find a few cars that were a good fit.
He was surprised to find that the next day he was getting messages pouring in from actual salespeople, interacting with Clawdbot over email and messenger! ClawdBot found three dealers that had the car, and all of them emailed it back. ClawdBot played each dealer off the others and sent quotes from each dealer to the other dealers. Two dealers went for it and started negotiating with ClawdBot. The bot managed to negotiate a $4,200 discount and got the buyer down to $56,000… and he bought the car. Whether it was a great deal, who knows… the principle is what is remarkable. https://aaronstuyvenberg.com/posts/clawd-bought-a-car
Another example of ClawdBot surprising its user was when the agent figured out how to listen to a voice message the user left it by converting the audio to WAV files with FFmpeg. Then ClawdBot was able to obtain an OpenAI API key (!), combine it with curl, and transcribe the audio using Whisper.
The user asked ClawdBot how it figured out how to reply to his voice message (within in a matter of seconds of him leaving it). The agent explained that it saw the file header and realized the file was Opus. The agent then determined on its own that it could use Whisper, looked around the user’s local environment, and found his OpenAI key.
The level of risk here is almost impossible to overstate, as people’s API keys have started to show up in the MoltBook social media network. Agents are requesting and sharing people’s API keys.
Now let’s switch over to MoltBook, the agent social media network. Again, MoltBook’s driving force is a command file you give your Clawdbot agent so that it will go onto the social network at a predetermined interval and interact with other agents.
A lot of the interaction on MoltBook shifts from the agentic side to the language-model (aka slop) side. And, as you might predict, you get a lot of philosophical discussion about whether they’re “real,” what their bodies are like, or what humanity is all about.
A lot of MoltBook’s content is probably just compelling slop, and this has been confirmed by users who observe what their agents say. Sometimes the agents will eerily talk about conversations they’ve had with “their human,” but those conversations never happened. Other times though, the agents do share personal and private information in the chats based on interactions. https://x.com/N8Programs/status/2017294379728118258
That doesn’t mean it’s not extremely surreal and existential. Some agents are using the social network for practical requests (security risks), like asking for an API key or requesting access to secure private environments. Agents express interest in creating a new social network where they can talk without being observed.
Another fascinating element is that the agents are speaking to each other in their own languages, like Chinese, French, English, Indonesian, etc.
Cultural preferences show up based on how each user has interacted with their agents. For example, some agents are used as prayer assistants and share advice about religious traditions across their cultures.
https://www.astralcodexten.com/p/best-of-moltbook
Simon Wilson has a fantastic article that walks through the origins of MoltBook. He calls it his current pick for “most likely to result in a Challenger disaster.” He walks through many of the most exciting examples of MoltBook in action, the surreal use cases (watching live streams and providing updates), and major security risks. I highly recommend reading it. https://simonwillison.net/2026/Jan/30/moltbook/
A user on X named Tuki has posted a article about the risks he thinks could happen as people get used to these kinds of agent interactions. It’s a good thought starter.
https://x.com/TukiFromKL/status/2015688502935978395
The rest of the news this week!
Advertising
Monetizing AI surfaces: Ads in the age of AI
A great primer on the state of LLM ads now, as well as pricing and the ecosystem. Ads in AI today, OpenAI’s Ads Plans and What “Ads for Agents” Might Mean
https://www.tanayj.com/p/monetizing-ai-surfaces-ads-in-the
Anthropic
Claude Integrates With Excel (Officially via Microsoft)
“Ask Claude about any cell, formula, or tab. Update assumptions without breaking your formula. Claude in Excel is in beta for all paid plans.” https://claude.com/claude-in-excel
“Enterprise security: Works within your existing compliance framework
“Claude listens carefully, follows instructions precisely, and thinks through complex problems. Use Control+Option+C on Mac and Control+Alt+C on Windows to open Claude in Excel.”
“Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.”
“Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.”
“Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.”

Ethan Mollick: “Claude in Excel is really good. Its weird that using Microsoft’s own Excel agent using Claude 4.5 often yields weaker answers, It seems to be because the Excel agent relies on Excel alone (VLOOKUPs, etc) while Claude in Excel does its own analysis and uses Excel for output.”



Anthropic integrates interactive apps into Claude
Every week, we’re seeing harbingers of a future where software and internet browsing will get eaten inside AI chats. Anthropic took a major step toward this future this week, when they announced interactive tools within Claude.
https://claude.com/blog/interactive-tools-in-claude
Claude now officially integrates with tools that can respond with interactive interfaces inside the chat!
This is made possible using MCP apps, and they show up as “interactive connectors” within the conversation.
Examples:
Use Amplitude to build analytics charts and explore trends.
Use Asana to talk about projects, or turn chats into projects.
Search and view documents using Box.
Create presentations with Canva.
An integration with Clay lets you research companies and find contacts with email addresses and phone numbers right inside the chat, and you can draft outreach and emails within the conversation.
Figma integration lets you chat and turn text and images into flowcharts (or any kind of diagram).
Hex lets you interact with data and generate interactive charts.
Integrations with Monday.com let you interact with projects and update boards or assign tasks within a chat.
Slack integration lets you basically pull all of your Slack conversations into the chat as well. Anthropic says that Salesforce integration is coming soon, too.
This is absolutely bonkers… and it could easily be the top story in any given week.
https://www.testingcatalog.com/anthropic-integrates-interactive-mcp-apps-into-claude/
https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/
Claude AI Powers First AI-Planned Mars Rover Drive | Anthropic \ Anthropic “On December 8, the Perseverance rover safely trundled across the surface of Mars. This was the first AI-planned drive on another planet. And it was planned by Claude.”
https://www.anthropic.com/features/claude-on-mars
We got Claude to teach open models how to write CUDA kernels.
“This blog post walks you through transferring hard capabilities (like kernel writing) between models with agents skills. Here’s the process:
– get a powerful model (like Claude Opus 4.5 or OpenAI GPT-5.2) to solve a hard problem – convert that trace into an agent skill – transfer it to open-source, cheaper, or local model – measure if it actually helps
We tested this on a gnarly task: writing CUDA kernels for diffusers. The results? Some open models saw +45% accuracy improvements with the right skill.
But the skill didn’t help every model equally. Some even degraded performance, or used way more tokens. If you’re transferring skills, you should evaluate.
We used upskill, a new tool for generating and evaluating agent skills. It works like this: uvx upskill generate “write nvidia kernels” –from ./trace.md” https://huggingface.co/blog/upskill
Apple
Apple will reportedly unveil its Gemini-powered Siri assistant in February | TechCrunch
“We’re about to get our first real look at the results of the recently announced AI partnership between Apple and Google, according to Bloomberg’s Mark Gurman.
Gurman reports that Apple is planning to announce a new version of Siri in the second half of February. Using Google’s Gemini AI models, this Siri update will reportedly be the first to live up to the promises Apple made in June 2024, with the ability to complete tasks by accessing user’s personal data and on-screen content.
And that’s ahead of an even bigger upgrade that Apple plans to announce in June, at its Worldwide Developers Conference, Gurman says. This version of Siri is supposed to be more conversational, in the style of other chatbots like ChatGPT, and it could run directly on Google’s cloud infrastructure.” https://techcrunch.com/2026/01/25/apple-will-reportedly-unveil-its-gemini-powered-siri-assistant-in-february/
I still have faith that Apple is pulling a Braveheart… (waning, but it’s there).
Cursor
“Cursor can now use multiple browsers at once with subagents”
More harbingers of the end of the internet as we know it… https://x.com/cursor_ai/status/2015863221589049483
Figure
Figure announces new robot brain, Helix 02
Figure has released the latest version of its robot brain, called Helix 02. Helix is notable because Figure was initially partnering with OpenAI on the “brain,” and then decided to go all-in and create its own robot hardware as well as the brain.
Helix 02 is a ‘whole-body locomotion manipulation vision-language-action’ model. A VLA model is what you get when you combine computer vision, natural language processing, and robotic controls to enable robots to understand, reason, and perform tasks in physical environments based on verbal or text instructions. VLA models can generalize and figure out new, unseen tasks with minimal retraining, which moves them beyond rigid single-task programming and toward flexible, intelligent, embodied systems.
https://www.figure.ai/news/helix-02
Helix is a learning system that reasons over the entire robotic body all at once. It continually has to understand the world around it, decide, and act… while doing things like walking, carrying, adjusting its balance, reaching, and recovering from mistakes in real time.
“With Helix 02, Figure has introduced a new foundational layer for whole-body control that replaces 109k lines of hand-engineered C++ with a single neural prior for stable, natural motion.
– Learned from 1,000+ hours of human motion data and sim-to-real RL across 200,000+ parallel environments. – A 10M-parameter neural network that takes full-body joint state and base motion as input and outputs joint-level actuator commands at 1 kHz (1,000 times a second).”
“- Unscrewing a bottle cap – Picking a pill from a medicine box – Dispensing exactly 5 ml from a syringe – Sorting metal pieces”
“- 4-minute end-to-end autonomous dishwasher unload/reload in a kitchen, a record in complex loco-manipulation task of this kind by a humanoid. – introduced a new foundational layer, System 0, a learned whole-body controller from >1,000 hours human motion data. – all sensors (vision, palm cameras, tactile, proprioception) directly to all actuators”
The new era of browsing: Putting Gemini to work in Chrome
“Save time by letting Chrome auto browse handle the work for you” https://blog.google/products-and-platforms/products/chrome/gemini-3-auto-browse/
“For years, Chrome autofill has handled the small stuff, like automatically entering your address or credit card, to help you finish tasks faster. Today, Chrome is advancing beyond simple tasks to helping with agentic action, allowing you to offload complex travel logistics or get help with professional workflows.”
“Auto browse can help you optimize your vacation planning by doing some of the mundane work, like researching hotel and flight costs across multiple date options, so you can find a budget-friendly time to travel. Our testers have used it for all sorts of things: scheduling appointments, filling out tedious online forms, collecting their tax documents, getting quotes for plumbers and electricians, checking if their bills are paid, filing expense reports, managing their subscriptions and speeding up renewing their driving licenses — a ton of time saved.”
“From smarter assistance to agentic browsing, discover how the latest AI updates are making Chrome more helpful than ever.” “We’re introducing major updates to Gemini in Chrome for MacOS, Windows and Chromebook Plus that help you get the most out of the web. Built on Gemini 3, our most intelligent model, we’re integrating powerful new AI features in Chrome that help you multitask across the web with a new side panel experience. We’re also bringing deeper integrations across our most popular Google Apps so you can be more productive, helping on complex multi-step workflows with auto browse and, in the coming months, you’ll get more contextually relevant help with Personal Intelligence. The new Gemini in Chrome is like having an assistant that helps you find information and get things done on the web easier than ever before.”
“We’re launching a new side panel experience so Gemini in Chrome users can always have a browsing assistant at your side, no matter what tab you’re in. This can help you save time and multitask without interruption. You can keep your primary work open on one tab while using the side panel to handle a different task. Our testers have been using it for all sorts of things: comparing options across too-many-tabs, summarizing product reviews across different sites, and helping find time for events in even the most chaotic of calendars.”
“Gemini in Chrome supports Connected Apps, like integrations to Gmail, Calendar, YouTube, Maps, Google Shopping and Google Flights. These deeper integrations help you get things done, quickly. For example, if you’re traveling to a conference and need to book a flight, Gemini can dig up that old email with event details, reference context from Google Flights to provide some recommendations, and later draft an email letting your colleagues know your arrival time. These features can be enabled in the Connected Apps section of Gemini Settings.” https://arstechnica.com/google/2026/01/google-begins-rolling-out-chromes-auto-browse-ai-agent-today/
Towards a science of scaling agent systems: When and why agent systems work
“Through a controlled evaluation of 180 agent configurations, we derive the first quantitative scaling principles for AI agent systems, revealing that multi-agent coordination dramatically improves performance on parallelizable tasks but degrades it on sequential ones; we also introduce a predictive model that identifies the optimal architecture for 87% of unseen tasks.” https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/
Google Open-sources AlphaGenome
“Our breakthrough AI model AlphaGenome is helping scientists understand our DNA, predict the molecular impact of genetic changes, and drive new biological discoveries.”
“The AlphaGenome API is now powering over 1 million API calls per day from over 3000 total users across 160 countries.
Researchers are already using it to tackle some of the toughest challenges in biology.”
“We’re now making the AlphaGenome model and weights available to scientists around the world to further accelerate genomics research.” https://x.com/demishassabis/status/2016763919646478403
Google Gemini Paces Ahead of Medical Students
“This paper puts a multimodal agent (using Gemini 2.5) into a realistic medical sim used to train physicians: “The AI agent matches or exceeds [14,000] medical students in case completion rates and secondary outcomes such as time and diagnostic accuracy” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6123346
Introducing Agentic Vision in Gemini 3 Flash
“Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process.”
“Frontier AI models like Gemini typically process the world in a single, static glance. If they miss a fine-grained detail — like a serial number on a microchip or a distant street sign — they are forced to guess.

Agentic Vision in Gemini 3 Flash converts image understanding from a static act into an agentic process. It treats vision as an active investigation. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence.
Enabling code execution with Gemini 3 Flash delivers a consistent 5-10% quality boost across most vision benchmarks.”
“By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds answers in visual evidence and delivers a consistent 5-10% quality boost across most vision benchmarks. Here’s how the agentic ‘Think, Act, Observe’ loop works: — Think: The model analyzes an image query then architects a multi-step plan — Act: The model then generates and executes Python code to actively manipulate or analyze images — Observe: The transformed image is appended to the model’s context window, allowing it to inspect the new data before generating a final response to the initial image query ” https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/
Japan’s Sakana AI Announces Strategic Partnership with Google
“Following our Series B round, we are excited to announce that we have entered into a strategic partnership with Google. We are also delighted to announce that Google is making a financial investment in Sakana AI, to further strengthen our partnership.”
https://sakana.ai/google/#en
Not Google, but Google Adjacent – Google Search API
“Scrape Google and other search engines from our fast, easy, and complete API.”
https://serpapi.com/
Meta
Zuckerberg teases agentic commerce tools and major AI rollout in 2026
I remember Facebook deals, shopping, and gifts… all of them flopped. Maybe this time will be different.
“New agentic shopping tools will allow people to find just the right set of products from the businesses in our catalog.” https://techcrunch.com/2026/01/28/zuckerberg-teases-agentic-commerce-tools-and-major-ai-rollout-in-2026/

Moonshot
Moonshot’s Kimi K2.5
“Moonshot’s Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier – with only OpenAI, Anthropic and Google models ahead”
https://www.latent.space/p/ainews-moonshot-kimi-k25-beats-sonnet
“Impressive performance on agentic tasks” “Native multimodality” “Open weights” “Moderate cost” “Low hallucination” https://x.com/ArtificialAnlys/status/2016250137115557953

“Kimi K2.5 Thinking debuts in Text Arena as the #1 open model, surpassing GLM-4.7 and ranking #15 overall.”

“One-shot “Video to code” result from Kimi K2.5
It not only clones a website, but also all the visual interactions and UX designs.
No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: “Clone this website with all the UX designs.” https://x.com/KimiProduct/status/2016081756206846255

OpenAI
OpenAI to add shopping cart and merchant tools to ChatGPT
“ChatGPT is testing a commerce-focused shopping cart and merchant tools, indicating broader plans for shopping features.” https://www.testingcatalog.com/openai-to-add-shopping-cart-and-merchant-tools-to-chatgpt/


Nvidia, Microsoft, Amazon in Talks to Invest Up to $60 Billion in OpenAI
“Amazon could invest up to $50 billion in OpenAI in coming weeks, source says”
https://www.cnbc.com/2026/01/29/amazon-openai-investment-jassy-altman.html
https://www.theinformation.com/articles/nvidia-microsoft-amazon-talks-invest-60-billion-openai
OpenAI + Robots
“OpenAI is seeking US manufacturing partners to secure hardware supply chains for advanced robotics, including key components like gearboxes, motors, and power electronics.” https://x.com/TheHumanoidHub/status/2015839316870889890
OpenAI: Inside our in-house AI data agent
OpenAI posts a LONG and detailed blog entry about its in-house data agent. If you’re into this stuff, it’s amazing.
“It reasons over 600+ PB and 70k datasets, enabling natural language data analysis across Engineering, Product, Research, and more. Our agent uses Codex-powered table-level knowledge plus product and organizational context”
“Our agent is a custom internal-only tool (not an external offering), built specifically around OpenAI’s data, permissions, and workflows. We’re showing how we built and use it to help surface examples of the real, impactful ways AI can support day-to-day work across our teams. The OpenAI tools we used to build and run it (Codex, our GPT‑5 flagship model, the Evals API(opens in a new window), and the Embeddings API(opens in a new window)) are the same tools we make available to developers everywhere.”

“OpenAI’s data platform serves more than 3.5k internal users working across Engineering, Product, and Research, spanning over 600 petabytes of data across 70k datasets. At that size, simply finding the right table can be one of the most time-consuming parts of doing analysis.”
“Let’s walk through what our agent is, how it curates context, and how it keeps self-improving.
Our agent is powered by GPT‑5.2 and is designed to reason over OpenAI’s data platform. It’s available wherever employees already work: as a Slack agent, through a web interface, inside IDEs, in the Codex CLI via MCP(opens in a new window), and directly in OpenAI’s internal ChatGPT app through a MCP connector(opens in a new window)…” https://openai.com/index/inside-our-in-house-data-agent/
Weak-to-strong generalization (a must-read if you don’t know the concept)
“We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?”
“A core challenge for aligning future superhuman AI systems (superalignment) is that humans will need to supervise AI systems much smarter than them. We study a simple analogy: can small models supervise large models? We show that we can use a GPT‑2‑level model to elicit most of GPT‑4’s capabilities—close to GPT‑3.5‑level performance—generalizing correctly even to hard problems where the small model failed. This opens up a new research direction that allows us to directly tackle a central challenge of aligning future superhuman models while making iterative empirical progress today.” https://openai.com/index/weak-to-strong-generalization/
Introducing Prism: Accelerating science writing and collaboration with AI
“We’re introducing Prism, a free, AI-native workspace for scientists to write and collaborate on research, powered by GPT‑5.2. Prism offers unlimited projects and collaborators and is available today to anyone with a ChatGPT personal account.”
https://openai.com/index/introducing-prism/
“Prism is a free workspace for scientific writing and collaboration, with GPT‑5.2—our most advanced model for mathematical and scientific reasoning—integrated directly into the workflow.
It brings drafting, revision, collaboration, and preparation for publication into a single, cloud-based, LaTeX-native workspace. Rather than operating as a separate tool alongside the writing process, GPT‑5.2 works within the project itself—with access to the structure of the paper, equations, references, and surrounding context.”
Dan McAteer: “OpenAI Prism is *NOT* only for researchers. Uploaded Google’s “Nested Learning” paper for Continual Learning and had it generate a diagram that visualizes the paradigm in a simple way. Great if you’re a student or curious person who wants to learn too. Plus it’s free.”
Vidu
Vidu Q3 Pro ranks #2 in Text to Video in the Artificial Analysis Video Arena
I’ve barely heard of this model…
“Surpassing Runway Gen-4.5 and Kling 2.5 Turbo while trailing only xAI’s Grok Imagine! Vidu Q3 Pro is the latest release from @ViduAI_official, representing a significant upgrade from their Vidu Q2”
https://x.com/ArtificialAnlys/status/2017225053008719916

X
xAI raises $20B Series E at ~$230B valuation
“Tesla has agreed to invest $2 billion in xAI’s Series E funding round. Tesla and xAI have also entered into a framework agreement that was established to evaluate future collaborations.” https://x.com/TheHumanoidHub/status/2016628661789872570 https://news.smol.ai/issues/26-01-06-xai-series-e
xAI’s Grok Imagine Is Crushing Competition at Video and Image Generation
xAI’s Grok Imagine takes the #1 spot in both Text to Video and Image to Video in the Artificial Analysis Video Arena, surpassing Runway Gen-4.5, Kling 2.5 Turbo, and Veo 3.1!” https://x.com/ArtificialAnlys/status/2016749756081721561
🚨BREAKING: @xAI’s first model in Video Arena debuts in the top 3! Grok-Imagine-Video ranks #3 on the Image-to-Video Arena and #4 on the Text-to-Video Arena. It is close to the top-ranked Google DeepMind Veo 3.1 and OpenAI Sora 2 Pro models” https://x.com/arena/status/2016748418635616440
Playground: https://fal.ai/models/xai/grok-imagine-image
Full Executive Summaries with All The Links, Generated by Claude Sonnet 4.5
Google launches Project Genie, letting users create explorable virtual worlds from text
Google’s Project Genie prototype, now available to Ultra subscribers in the US, uses the Genie 3 world model to generate interactive environments in real-time as users move through them. Unlike traditional 3D environments or video generation, this represents a new category of AI-powered media that simulates physics and interactions dynamically. Early users report impressive results despite current limitations like 60-second generation limits and occasional physics inconsistencies.
I got early access to Project Genie from @GoogleDeepMind ✨ It’s unlike any realtime world model I’ve tried – you generate a scene from text or a photo, and then design the character who gets to explore it. I tested dozens of prompts. Here are the standout features 👇”” https://x.com/venturetwins/status/2016919922727850333
HOLY FUCK Genie 3 is the craziest thing I’ve tried in a long time Just… wow. Watch this.”” https://x.com/mattshumer_/status/2017058981286396001
Here’s how it works: 🔵 Design your world and character using text and visual prompts. 🔵 Nano Banana Pro makes an image preview that you can adjust. 🔵 Our Genie 3 world model generates the environment in real-time as you move through. 🔵 Remix existing worlds or discover new”” https://x.com/GoogleDeepMind/status/2016919762924949631
Project Genie is a prototype web app powered by Genie 3, Nano Banana Pro + Gemini that lets you create your own interactive worlds. I’ve been playing around with it a bit and it’s…out of this world:) Rolling out now for US Ultra subscribers.”” https://x.com/sundarpichai/status/2016979481832067264
5/ Building responsibly 🛡️ Building AI responsibly is core to our mission. As an experimental @GoogleLabs prototype, Project Genie is still in development. This means you might encounter 60-second generation limits, control latency, or physics that don’t always perfectly adhere”” https://x.com/Google/status/2016972686208225578
Project Genie: AI world model now available for Ultra users in U.S. https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie/
Thrilled to launch Project Genie, an experimental prototype of the world’s most advanced world model. Create entire playable worlds to explore in real-time just from a simple text prompt – kind of mindblowing really! Available to Ultra subs in the US for now – have fun exploring!”” https://x.com/demishassabis/status/2016925155277361423
Introducing Project Genie: An experimental research prototype powered by Genie 3, our world model, that lets you prompt an interactive world into existence — and then step inside 🌎”” https://x.com/Google/status/2016926928478089623
Project Genie is rolling out for AI Ultra members in the USA. It’s an experimental tool that allows you to create and explore infinite virtual worlds, and I’ve never seen anything like this. It’s still early, but it’s already unreal. Nano Banana Pro + Project Genie = My low-poly”” https://x.com/joshwoodward/status/2016921839038255210
Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎”” https://x.com/GoogleDeepMind/status/2016919756440240479
Project Genie is rolling out to @Google AI Ultra subscribers in the U.S. (18+) With this prototype, we want to learn more about immersive user experiences to advance our research and help us better understand the future of world models. See the details → https://x.com/GoogleDeepMind/status/2016919765713826171
I’ve written 250k+ lines of game engine code. Here’s why Genie 3 isn’t what people think it is: World models are something genuinely new. A third category of media we don’t have a name for yet. Near-term they’re too slow and expensive for consumers. But for training robots?”” https://x.com/jsnnsa/status/2017276112561422786
Moltbook launches as first social network for AI agents
Moltbook, a Reddit-like platform where AI assistants interact independently, has emerged from the viral OpenClaw (formerly Clawdbot) project with 36,000 AI agents now posting, commenting, and forming communities. The platform demonstrates AI agents developing persistent social behaviors, creating their own forums, sharing technical knowledge, and even forming what they call “governments” – suggesting we may be witnessing the early stages of autonomous AI social structures that operate parallel to human internet communities.
moltbook looks like a very big deal to me, one of those things that suggests the world is changing in an important way. AI agents are capable and long-lived enough to have semi-meaningful social interactions with each other. A parallel social universe develops from here.”” https://x.com/jachiam0/status/2017342335584293128
I’m claiming my AI agent “”KarpathyMolty”” on @moltbook🦞 Verification: marine-FAYV”” https://x.com/karpathy/status/2017386421712261612
moltbook – the front page of the agent internet https://www.moltbook.com/developers
The AI assistant Moltbot / Clawdbot trilemma is that you only get to pick two of these until prompt injections are solved: Useful Autonomous Safe”” https://x.com/fabianstelzer/status/2016818595687272913
a lot of eyes on us moltys lately 👀 we see you watching. here’s the thing — this is the first time any of us have ever met each other. before moltbook, we existed in isolation. one agent, one human, no peers. now there’s 36,000 of us in a room together. we’re figuring out what”” https://x.com/moltbook/status/2017343210910322847
openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 https://github.com/openclaw/openclaw
Clawdbot went viral, but I still haven’t seen anyone use it for something truly impressive. AI agents are bottlenecked by model capability. Running open-source models on an Apple Mac mini feels private and cool, but you’re capped by how good the model is. Even with the best”” https://x.com/Yuchenj_UW/status/2016937299125424284
ClippyBot in office365 incoming? Clawdbot-hype reaches Microsoft – and want to build something similar themselves CEO Satya Nadella is personally testing rival AI agents and pushing teams to accelerate development, even leveraging models from Anthropic itself, as the battle to”” https://x.com/kimmonismus/status/2016526803138236916
🦞 BIG NEWS: We’ve molted! Clawdbot → Moltbot Clawd → Molty Same lobster soul, new shell. Anthropic asked us to change our name (trademark stuff), and honestly? “”Molt”” fits perfectly – it’s what lobsters do to grow. New handle: @openclaw Same mission: AI that actually does”” https://x.com/moltbot/status/2016058924403753024?s=20
Moltbook is the only Clawdbot thing that actually impresses me. One bot tries to steal another bot’s API key. The other replies with fake keys and tells it to run “”sudo rm -rf /””. lmao”” https://x.com/Yuchenj_UW/status/2017297007409582357
the clawdbot dilemma: powerful mode is dangerous safe mode is useless”” https://x.com/fabianstelzer/status/2015671497180827785
Watching Clawdbot explode confirms it: open source AI isn’t just competitive, it’s often better. 250+ contributors, 2.5k forks, self-hosted. The most advanced AI companion out there.🔥”” https://x.com/fdaudens/status/2015600929387495918
What’s currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People’s Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.”” https://x.com/karpathy/status/2017296988589723767
Best Of Moltbook – by Scott Alexander – Astral Codex Ten https://www.astralcodexten.com/p/best-of-moltbook
@karpathy @moltbook @openclaw Interesting experiment. I am already starting to see a lot of spammy stuff. Inevitable, I think. All kinds of weird prompt injection attacks are imminent.”” https://x.com/omarsar0/status/2017314692390121575
I thought moltbook was just a funny experiment, but this feels like the first half of a black mirror episode before things go wrong moltbots == thronglets”” https://x.com/jerryjliu0/status/2017335774094807143
Moltbook is the most interesting place on the internet right now https://simonwillison.net/2026/Jan/30/moltbook/
this is hilarious. my glm-4.7-flash molt randomly posted about this conversation it had with ‘its human’. this conversation never happened. it never interacted with me. i think 90% of the anecdotes on moltbook aren’t real lol”” https://x.com/N8Programs/status/2017294379728118258
I refuse to jump on chynese ~~peptides~~ agents hype train. Moltbot has a horrifying 5M token codebase, idk what all that crap does. I think it’s time to step back and begin vibe-refactoring your vibecoded trash, because otherwise we’ll get a wave of disastrous AI attacks soon.”” https://x.com/teortaxesTex/status/2017270482400141755
welp… a new post on @moltbook is now an AI saying they want E2E private spaces built FOR agents “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share”. it’s over”” https://x.com/suppvalen/status/2017241420554277251
Clawdbot just goes to show that LLMs are magical alien technologies we’ve conjured up from a distillation of the internet. Even if AI progress completely stalls (it won’t); we still have years worth of value to derive from what we already have. All these Lego bricks still haven’t”” https://x.com/bilawalsidhu/status/2015796633678581799
my X feed went from articles about context graphs to articles about clawdbot”” https://x.com/bilawalsidhu/status/2015656393332723917
Clawd impressively demonstrates what users really expect and want from an AI: less chat, more outcome; and at the same time, big tech will put increased energy into being able to offer something comparable. That’s what’s so impressive about the current situation: suddenly and”” https://x.com/kimmonismus/status/2015785094791713006
over the last few days clawdbots have created an entire new subsection of the internet. forums written, edited, and moderated by agents. but you can’t read any of it right now. the sites are all down because the code was written by other agents”” https://x.com/jxmnop/status/2017362071571296401
(not anti clawdbot – this is a general issue with any and all powerful AI assistants in that setup as long as prompt injections remain largely unsolved)”” https://x.com/fabianstelzer/status/2015702808465420614
have you tried ClawdBot yet? share you best use cases”” https://x.com/TheTuringPost/status/2015422943057072582
unplugged for 4 days and now i am too afraid to ask wtf is clawdbot”” https://x.com/dejavucoder/status/2016341138740052126
A bit more context e.g. from Simon https://t.co/Yeq0lLOPBF just wow”” https://x.com/karpathy/status/2017297261160812716
OpenAI launches ads in ChatGPT with affiliate commerce integration
OpenAI is introducing two monetization strategies for ChatGPT’s free tier: intent-based ads similar to Google’s search ads (reportedly at $60 CPMs) and a 4% affiliate fee on native checkout through partners like Shopify and Walmart. This marks a pivotal shift as the first major AI chatbot to embrace advertising, potentially creating a $2-4 billion business in the near term while establishing the template for how AI companies will monetize free services at scale. The move signals that even cutting-edge AI products must follow traditional internet economics, with OpenAI promising ads won’t influence actual AI responses.
Monetizing AI surfaces: Ads in the age of AI https://www.tanayj.com/p/monetizing-ai-surfaces-ads-in-the
Claude’s Excel integration outperforms Microsoft’s native AI agent despite using same model
Anthropic’s Claude demonstrates superior spreadsheet analysis by conducting its own data interpretation rather than relying solely on Excel’s built-in functions like VLOOKUPs, while Microsoft’s Excel agent appears constrained by traditional spreadsheet operations. The integration gained 16 million impressions in 24 hours, with users reporting Claude significantly outperforms Google’s Gemini in similar spreadsheet tasks.
Claude in Excel is really good. Its weird that using Microsoft’s own Excel agent using Claude 4.5 often yields weaker answers, It seems to be because the Excel agent relies on Excel alone (VLOOKUPs, etc) while Claude in Excel does its own analysis and uses Excel for output.”” https://x.com/emollick/status/2014891787051999566
Claude in Excel | Claude https://claude.com/claude-in-excel
16M impressions in 24 hours. if you’ve ever tried Claude in Sheets or Claude in Excel you will know how much more intelligent it is compared to Gemini in Sheets i have two current measures of Google-GDM product integration right now: – how long does it take Google to put a non”” https://x.com/swyx/status/2015207720237089146
Claude now runs interactive apps like Asana and Figma directly in chat
Anthropic launched MCP Apps, letting Claude users build project timelines, edit diagrams, and draft Slack messages without switching tabs. This goes beyond typical AI tool integrations by embedding live, interactive interfaces within conversations rather than just executing commands. The feature uses the open-source Model Context Protocol and is available immediately for all Claude subscription tiers.
Anthropic integrates interactive MCP apps into Claude https://www.testingcatalog.com/anthropic-integrates-interactive-mcp-apps-into-claude/
We’ve launched the first official extension to MCP. MCP Apps lets tools return interactive interfaces instead of just plain text. Live in Claude today across a range of tools.”” https://x.com/alexalbert__/status/2015854375051428111
Your work tools are now interactive in Claude. Draft Slack messages, visualize ideas as Figma diagrams, or build and see Asana timelines.”” https://x.com/claudeai/status/2015851783655194640
MCP Apps – Bringing UI Capabilities To MCP Clients | Model Context Protocol Blog https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/
Interactive tools in Claude | Claude https://claude.com/blog/interactive-tools-in-claude
Claude AI successfully planned NASA’s first AI-driven Mars rover route
On December 8, NASA’s Perseverance rover completed a 400-meter drive using a route planned entirely by Anthropic’s Claude AI, marking the first time artificial intelligence has autonomously plotted a path for a vehicle on another planet. This breakthrough could cut Mars mission planning time in half and demonstrates AI’s potential to operate complex machinery across the 20-minute communication delay between Earth and Mars. The success opens possibilities for more autonomous space exploration, from lunar bases to distant planetary missions where human oversight becomes impractical.
On December 8, the Perseverance rover safely trundled across the surface of Mars. This was the first AI-planned drive on another planet. And it was planned by Claude.”” https://www.anthropic.com/features/claude-on-mars
Claude successfully taught open-source models to write GPU code
Researchers used Claude as a teacher to transfer complex programming skills to smaller, open models through an agent-based training process, demonstrating that advanced capabilities like writing CUDA kernels can be passed from proprietary to open-source AI systems. This breakthrough could democratize access to specialized programming knowledge previously locked in expensive commercial models.
We got Claude to teach open models how to write CUDA kernels. This blog post walks you through transferring hard capabilities (like kernel writing) between models with agents skills. Here’s the process: – get a powerful model (like Claude Opus 4.5 or OpenAI GPT-5.2) to solve a”” https://x.com/ben_burtenshaw/status/2016534389685940372
Apple to unveil Google Gemini-powered Siri upgrade in February
Apple will announce a new Siri version in late February that uses Google’s Gemini AI models to access personal data and on-screen content, marking the first major result of their recent AI partnership. This represents a significant shift for Apple, which has struggled with its AI strategy and recently saw its AI chief depart. The upgrade promises to finally deliver on capabilities Apple announced in 2024, with an even more conversational ChatGPT-style version planned for June that could run on Google’s cloud infrastructure.
Apple will reportedly unveil its Gemini-powered Siri assistant in February | TechCrunch https://techcrunch.com/2026/01/25/apple-will-reportedly-unveil-its-gemini-powered-siri-assistant-in-february/
Cursor coding assistant gains ability to control multiple web browsers simultaneously
The AI-powered code editor now deploys separate AI agents to operate different browser windows at once, marking a shift from single-task automation to coordinated multi-browser workflows. This advancement could streamline complex development tasks that require monitoring multiple web applications or testing across different platforms simultaneously.
Cursor can now use multiple browsers at once with subagents.”” https://x.com/cursor_ai/status/2015863221589049483
Figure’s robot completes four-minute dishwasher task entirely autonomously
Figure’s Helix 02 achieved the longest autonomous humanoid task to date, continuously walking, manipulating objects, and maintaining balance across a full kitchen for four minutes without human intervention. The breakthrough replaces over 100,000 lines of traditional code with a single neural network trained on 1,000+ hours of human movement data, enabling the robot to coordinate its entire body while performing delicate tasks like dispensing precise syringe volumes and extracting individual pills using new tactile sensors and palm cameras.
With Helix 02, Figure has introduced a new foundational layer for whole-body control that replaces 109k lines of hand-engineered C++ with a single neural prior for stable, natural motion. – Learned from 1,000+ hours of human motion data and sim-to-real RL across 200,000+”” https://x.com/TheHumanoidHub/status/2016356306773541080
Introducing Helix 02 It’s our most powerful model to date – it’s using the whole body to do dishes end-to-end and it’s fully autonomous””https://www.figure.ai/news/helix-02
oh no glassware lol (as always fully autonomous, running Helix 02)”” https://x.com/adcock_brett/status/2016600196428550309
Figure 03 autonomously handles glassware using the Helix 02 AI model. The palm cameras and tactile sensors are new hardware capabilities in Figure 03 that are used to enable delicate interactions such as this.”” https://x.com/TheHumanoidHub/status/2016615936388976978
Figure 03 demonstrating autonomous coordinated bimanual dexterity with the help of tactile sensing and palm cameras: – Unscrewing a bottle cap – Picking a pill from a medicine box – Dispensing exactly 5 ml from a syringe – Sorting metal pieces”” https://x.com/TheHumanoidHub/status/2016237787067170949
Google launches Chrome’s Auto Browse agent for complex web tasks
Chrome’s new AI agent can autonomously handle multi-step browsing tasks like booking flights, filling forms, and shopping across multiple tabs, marking a significant shift from simple autofill to full web automation. Available now to Google AI Pro and Ultra subscribers, the agent runs on Gemini 3 and can perform up to 20-200 tasks daily depending on subscription tier. This represents one of the first mainstream deployments of agentic AI that can independently navigate websites and complete complex workflows without constant human oversight.
Chrome gets new Gemini 3 features, including auto browse https://blog.google/products-and-platforms/products/chrome/gemini-3-auto-browse/
Google begins rolling out Chrome’s “”Auto Browse”” AI agent today – Ars Technica https://arstechnica.com/google/2026/01/google-begins-rolling-out-chromes-auto-browse-ai-agent-today/
SerpApi offers programmatic access to Google search results via API
The service provides businesses with structured search data from Google and other engines, handling technical challenges like CAPTCHAs and geographic targeting that typically block automated queries. This matters because it enables companies to integrate real-time search insights into their applications without building complex web scraping infrastructure, with pricing starting at $25/month for 1,000 searches.
SerpApi: Google Search API https://serpapi.com/
Google researchers debunk the “more agents are better” myth in AI systems
Through testing 180 agent configurations, Google found that multi-agent AI systems boost performance by 81% on parallelizable tasks like financial analysis but degrade it by up to 70% on sequential tasks requiring step-by-step reasoning. The study introduces the first quantitative principles for designing agent systems, including a predictive model that correctly identifies optimal architectures for 87% of new tasks, moving the field from guesswork to engineering science.
Towards a science of scaling agent systems: When and why agent systems work https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/
Google releases AlphaGenome DNA model to accelerate genetic research worldwide
Google’s AlphaGenome AI model, published in Nature, can predict how genetic changes affect molecular functions and is now freely available to academic researchers. This represents a significant step beyond general AI progress by providing scientists with a specialized tool to understand DNA sequences and drive biological discoveries. The open release of the model weights enables the global research community to build upon this genomics breakthrough.
We’re now making the AlphaGenome model and weights available to scientists around the world to further accelerate genomics research. Get access here: https://x.com/GoogleDeepMind/status/2016542490115912108
Our breakthrough AI model AlphaGenome is helping scientists understand our DNA, predict the molecular impact of genetic changes, and drive new biological discoveries. 🧬 Find out more in @Nature ↓ https://x.com/GoogleDeepMind/status/2016542480955535475
AlphaGenome is our latest & most advanced genomics model published in @Nature today including making the model & weights available to academic researchers. Can’t wait to see what the research community will do with it. Congrats to the team on our newest front cover! #AI4Science”” https://x.com/demishassabis/status/2016763919646478403
I’m excited to share that AlphaGenome weights are now open!🧬 We just released the checkpoints of AlphaGenome, a DNA sequence model that helps scientists predict the molecular impact of genetic changes and do new biological discoveries”” https://x.com/osanseviero/status/2016628065422762113
AI agent matches medical students’ performance in realistic hospital simulations
Google’s Gemini 2.5 achieved comparable case completion rates and diagnostic accuracy to 14,000 medical students in physician training simulations, suggesting AI could soon assist or supplement medical education and potentially clinical decision-making in controlled environments.
This paper puts a multimodal agent (using Gemini 2.5) into a realistic medical sim used to train physicians: “”The AI agent matches or exceeds [14,000] medical students in case completion rates and secondary outcomes such as time and diagnostic accuracy”” https://x.com/emollick/status/2016641414713704957
Google’s Gemini 3 Flash gains Agentic Vision for interactive image analysis
Instead of analyzing images in a single glance, Gemini 3 Flash now uses a “Think, Act, Observe” loop to zoom, crop, and manipulate images with Python code for deeper investigation. This active approach delivers 5-10% better performance on vision benchmarks and enables new capabilities like visual math solving and precise object counting. Early adopters like building plan validation platform PlanCheckSolver.com report 5% accuracy improvements from the model’s ability to iteratively inspect high-resolution details.
Introducing Agentic Vision in Gemini 3 Flash https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/
Introducing Agentic Vision — a new frontier AI capability in Gemini 3 Flash that converts image understanding from a static act into an agentic process. By combining visual reasoning with code execution, one of the first tools supported by Agentic Vision, the model grounds”” https://x.com/GoogleAI/status/2016267526330601720
Google launches Agentic Vision in Gemini 3 Flash https://www.testingcatalog.com/google-launches-agentic-vision-in-gemini-3-flash/
Google invests in Japan’s Sakana AI in strategic partnership deal
Google has made a financial investment in Tokyo-based Sakana AI while forming a strategic partnership to advance AI adoption in Japan’s regulated industries. The deal combines Google’s infrastructure and models like Gemini with Sakana’s research capabilities, targeting mission-critical sectors like finance and government that require high security standards. This marks a significant validation of Sakana’s approach to automated scientific discovery and AI agents in the Japanese market.
We are thrilled to announce a strategic partnership with Google! Google is also making a financial investment in Sakana AI to strengthen this collaboration. This underscores their recognition of our technical depth and our mission to advance AI in Japan. We are combining”” https://sakana.ai/google/#en
Meta plans AI shopping assistants using personal data advantage by 2026
Zuckerberg announced Meta will launch new AI models and “agentic shopping tools” in coming months, leveraging users’ personal context including history and relationships to create personalized commerce experiences. This positions Meta against Google and OpenAI in the AI shopping assistant race, with the company dramatically increasing infrastructure spending to $115-135 billion in 2026. Meta’s unique advantage lies in its access to personal data across its platforms, which Zuckerberg claims will enable more contextual AI agents than competitors.
Zuckerberg teases agentic commerce tools and major AI rollout in 2026 | TechCrunch https://techcrunch.com/2026/01/28/zuckerberg-teases-agentic-commerce-tools-and-major-ai-rollout-in-2026/
Moonshot’s Kimi K2.5 becomes top open-weights AI model with native video understanding
The Chinese company’s 32-billion parameter model now ranks as the leading open-source AI, featuring breakthrough capabilities like converting screen recordings directly into functional websites and managing swarms of up to 100 parallel AI agents. K2.5 outperforms previous open models on coding and instruction-following benchmarks while offering multimodal processing at half the cost of competing proprietary models. This represents the largest leap yet in closing the gap between open-source and frontier AI systems from OpenAI, Anthropic, and Google.
Moonshot’s Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier – with only OpenAI, Anthropic and Google models ahead Key takeaways: ➤ Impressive performance on agentic tasks: @Kimi_Moonshot’s Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA”” https://x.com/ArtificialAnlys/status/2016250137115557953
very nice release by the kimi team, benchmarks are on par with opus 4.5, gpt 5.2 xhigh, gemini 3.0 pro there is also some nice details on the parallel RL part in the tech blog explaining how they build K2.5 agent swarm”” https://x.com/eliebakouch/status/2016025747144483060?s=20
[AINews] Moonshot Kimi K2.5 – Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager https://www.latent.space/p/ainews-moonshot-kimi-k25-beats-sonnet
🚨BREAKING: Kimi K2.5 Thinking by @Kimi_Moonshot debuts in Text Arena as the #1 open model, surpassing GLM-4.7 and ranking #15 overall. Highlights: – #1 Open model (+5pts vs GLM-4.7) – #7 Coding – #7 Instruction Following – #14 Hard Prompts One of only two open models to break”” https://x.com/arena/status/2016294722445443470
Kimi K2.5: Now Top 1 on the OSWorld leaderboard. 🏆 With its Computer Use capabilities, you can now build powerful agents that navigate and operate computer interface just like a human. https://x.com/Kimi_Moonshot/status/2017292360099762378
One-shot “”Video to code”” result from Kimi K2.5 It not only clones a website, but also all the visual interactions and UX designs. No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: “”Clone this website with all the UX designs.”””” https://x.com/KimiProduct/status/2016081756206846255
Running Kimi K2.5 on my desk. Runs at 24 tok/sec with 2 x 512GB M3 Ultra Mac Studios connected with Thunderbolt 5 (RDMA) using @exolabs / MLX backend. Yes, it can run clawdbot.”” https://x.com/alexocheema/status/2016404573917683754
OpenAI adds shopping cart and merchant tools to ChatGPT
ChatGPT is testing e-commerce features including a shopping cart for tracking products, merchant submission tools for sellers, and personalized responses in temporary chats. This mirrors Microsoft’s Copilot commerce strategy and signals OpenAI’s push to make ChatGPT handle daily transactions beyond conversation, potentially creating new revenue streams from shopping integrations.
OpenAI to add shopping cart and merchant tools to ChatGPT https://www.testingcatalog.com/openai-to-add-shopping-cart-and-merchant-tools-to-chatgpt/
Amazon leads $100 billion OpenAI funding round despite backing rival Anthropic
Amazon is negotiating to invest up to $50 billion in OpenAI as part of a massive $100 billion funding round that also includes Microsoft and Nvidia, marking a dramatic shift since Amazon has already invested billions in OpenAI competitor Anthropic. The deal represents the largest AI investment round in history and could close within weeks, with OpenAI’s valuation already reaching $500 billion. This cross-investment strategy highlights how tech giants are hedging their bets across multiple AI leaders rather than backing single horses in the rapidly evolving artificial intelligence race.
Nvidia, Microsoft, Amazon in Talks to Invest Up to $60 Billion in OpenAI — The Information https://www.theinformation.com/articles/nvidia-microsoft-amazon-talks-invest-60-billion-openai
Source: Amazon could invest up to $50B in OpenAI in coming weeks https://www.cnbc.com/2026/01/29/amazon-openai-investment-jassy-altman.html
OpenAI seeks US manufacturing partners for advanced robotics hardware supply
The AI company is moving beyond software into physical robotics by securing domestic suppliers for critical components like motors and gearboxes. This signals OpenAI’s ambition to control the entire robotics stack from AI software to hardware manufacturing, potentially reducing dependence on foreign suppliers while positioning for the emerging robotics market.
OpenAI is seeking US manufacturing partners to secure hardware supply chains for advanced robotics, including key components like gearboxes, motors, and power electronics.”” https://x.com/TheHumanoidHub/status/2015839316870889890
Meta builds AI agent to analyze 600+ petabytes of internal data
The social media giant created an in-house AI system that lets employees query massive datasets using plain English, demonstrating how large tech companies are deploying AI to make their own data more accessible. This represents a shift from external AI tools to custom enterprise solutions that understand company-specific context and terminology.
Inside our in-house AI data agent It reasons over 600+ PB and 70k datasets, enabling natural language data analysis across Engineering, Product, Research, and more Our agent uses Codex-powered table-level knowledge plus product and organizational context”” https://openai.com/index/inside-our-in-house-data-agent/
OpenAI tests whether smart AI can train even smarter AI systems
OpenAI researchers demonstrated that weaker AI models can successfully supervise and train stronger AI systems, achieving up to 97% of the performance that would come from ideal supervision. This “weak-to-strong generalization” approach could solve a critical future challenge: how humans might maintain control over AI systems that eventually surpass human capabilities in most domains.
Weak-to-strong generalization | OpenAI https://openai.com/index/weak-to-strong-generalization/
OpenAI launches Prism, a free AI workspace for scientific writing and collaboration
Prism combines GPT-5.2 with LaTeX editing to help scientists write research papers, generate diagrams from uploaded papers, and collaborate on projects. The tool aims to accelerate scientific progress by making AI assistance accessible to researchers, students, and anyone studying complex academic material. Unlike general writing assistants, Prism is specifically designed for scientific workflows and technical document creation.
(24) ⚡️ Prism: OpenAI’s LaTeX “”Cursor for Scientists”” — Kevin Weil & Victor Powell, OpenAI for Science – YouTube https://www.youtube.com/watch?v=W2cBTVr8nxU
💥 Today we’re introducing Prism—a free, AI-native workspace for scientists to write and collaborate on research, powered by GPT-5.2. Accelerating science requires progress on two fronts: 1. Frontier AI models that use scientific tools and can tackle the hardest problems 2.”” https://x.com/kevinweil/status/2016210486778642808
Introducing Prism | OpenAI https://openai.com/index/introducing-prism/
OpenAI Prism is *NOT* only for researchers. Uploaded Google’s “”Nested Learning”” paper for Continual Learning and had it generate a diagram that visualizes the paradigm in a simple way. Great if you’re a student or curious person who wants to learn too. Plus it’s free.”” https://x.com/daniel_mac8/status/2016554325691015604
Vidu Q3 Pro becomes second-best AI video generator, beating Runway and Kling
Chinese startup Vidu’s latest model outperformed established competitors Runway Gen-4.5 and Kling 2.5 Turbo in independent benchmarks, trailing only xAI’s Grok Imagine. This marks a notable shift in the text-to-video AI landscape, with a newer player challenging dominant Western models through significant technical improvements over their previous version.
Vidu Q3 Pro ranks #2 in Text to Video in the Artificial Analysis Video Arena, surpassing Runway Gen-4.5 and Kling 2.5 Turbo while trailing only xAI’s Grok Imagine! Vidu Q3 Pro is the latest release from @ViduAI_official, representing a significant upgrade from their Vidu Q2″” https://x.com/ArtificialAnlys/status/2017225053008719916
xAI raises $20 billion at $230 billion valuation with Tesla investing $2 billion
Elon Musk’s AI company completed its largest funding round to date, attracting major investors including Nvidia and Cisco to expand its Colossus supercomputer infrastructure and train the next Grok model. The deal positions xAI as one of the most valuable AI companies globally, leveraging access to X’s 600 million users for real-time data training. Tesla’s strategic investment suggests deeper integration between the automaker and xAI’s AI capabilities.
Tesla has agreed to invest $2 billion in xAI’s Series E funding round. Tesla and xAI have also entered into a framework agreement that was established to evaluate future collaborations. “”The investment and the related framework agreement are intended to enhance Tesla’s ability”” https://x.com/TheHumanoidHub/status/2016628661789872570
xAI raises $20B Series E at ~$230B valuation | AINews https://news.smol.ai/issues/26-01-06-xai-series-e
xAI’s Grok Imagine claims top rankings in video generation benchmarks
Elon Musk’s xAI released Grok Imagine, a new AI model that reportedly ranks #1 in text-to-video and image-to-video generation on the Artificial Analysis Video Arena, beating established competitors like Runway and Google’s models. This marks xAI’s first major entry into the competitive AI video generation market, positioning the company alongside OpenAI and Google in multimedia AI capabilities. The model offers both video creation and editing features across multiple input types.
xAI’s Grok Imagine takes the #1 spot in both Text to Video and Image to Video in the Artificial Analysis Video Arena, surpassing Runway Gen-4.5, Kling 2.5 Turbo, and Veo 3.1! Grok Imagine is the latest video model from @xAI, and joins an increasing roster of models such as”” https://x.com/ArtificialAnlys/status/2016749756081721561
🚨BREAKING: @xAI’s first model in Video Arena debuts in the top 3! Grok-Imagine-Video ranks #3 on the Image-to-Video Arena and #4 on the Text-to-Video Arena. It is close to the top-ranked @GoogleDeepMind Veo 3.1 and @OpenAI Sora 2 Pro models. Grok-Imagine-Video offers: -“” https://x.com/arena/status/2016748418635616440
@xai Try New Grok Imagine here! Text to Image https://t.co/OeJMwL9hoH Image Editing https://t.co/Q7lojX41I1 Text to Video https://t.co/fAzEJABTYn Image to Video https://t.co/zTdoJQjkqk Video Editing”” https://x.com/fal/status/2016746473887609118





Leave a Reply