About This Week’s Covers

This week’s main cover and category images are inspired by my friend John Bayalis, who is an incredible artist, adventurer, and photographer, as well as a pop-culture and cinema critic who could easily hang with Chuck Klosterman. He’s one of the best illustrators I know, but he doesn’t draw much! John also inspired my New Order cover theme a few months ago. He recently published a 20-page booklet of cloud photography, the speed of clouds, inspired by the 1983 Francis Ford Coppola film “Rumble Fish”. John wanted to depict the many ways clouds can inform the human experience through motion, shadows, colors, and precipitation. John’s projects come with Spotify playlists too!

I’ve known John Bayalis for most of my life. He inspires me in almost every facet of what I do.

John is a very special friend to me on many levels. One of them is that he’s known me since I was a kid, and he knows I was an English major, that I love the arts, that I love playing music, and that I love the humanities. When I explore AI, John never sees it through the lens of an imposter, the grotesque, or competition. He sees the best in me, lets me play, learn, and express myself with technology, and is always supportive.

I tried to sum it up in a text that I sent to him:

“I like the juxtaposition of my English major and love for the arts along with supporting you, and also testing the limits of technology as an art project itself. I identify more like a Bowie or Reznor than some douche-canoe tech bro slop machine. But that’s pretty damn subjective and scary since I don’t get to define how people see it.”

For my main cover, I used a photo I took during a five-mile run on the beach this fall. John loves bright neon colors and loneliness (like me), and the green of the Ferris wheel against the grey sky really reminded me of him. The cover is just a picture I took with a title added on top of it—no generative artificial intelligence necessary.

For the category covers this week, I took John’s project description from his website, combined it with my automated cover image Python scripts, and fed it to Claude…asking Claude to generate responses that would fit the scripts yet embody the feeling of John’s mission, applied to each of my categories.

I’ll share that information below. Here’s what Claude gave me:

Short theme name:
"The Speed of Clouds" (or just "Clouds" if you want it tighter)

One-sentence logline:
Each AI newsletter category reimagined as a stark black-and-white cloud photograph with bold title card typography, inspired by the way clouds mark time, cast shadows, and reveal truth through their ever-changing forms.

Longer description:
Like clouds, AI developments move at speeds we barely register until their shadows fall across our lives. This week's images treat each category as a distinct cloudscape - some heavy with precipitation (regulation, ethics), some racing across clear skies (breakthroughs, tools), some casting long shadows (concerns, labor), and some barely visible wisps on the horizon (emerging tech). The bold title cards anchor each image like movie credits against sky, creating simple compositions where the cloud formation itself embodies the category's character. Inspired by John's 2025 cloud archive and the timeless black-and-white aesthetic of Rumble Fish, where clouds weren't just weather but emotional punctuation.

Visual style keywords:
Cinematic black and white photography, bold sans-serif title cards, Rumble Fish aesthetic, minimal composition, dramatic cloud formations, high contrast, film grain texture, contemplative, skyward perspective

Color palette/mood:
Pure black and white with rich grayscale gradients, high contrast between clouds and sky, moody atmospheric depth, occasional stark whites and deep blacks, film photography tonality, contemplative and cinematic

Global constraints: 
Square format, black and white only, category name as bold title card text, simple sky/cloud compositions, no people, minimal or no ground/landscape, cloud formation carries meaning, avoid AI brain in clouds cliché

John’s Description: The Speed of Clouds | Atlanta, GA | December 2025
Inspired by the 1983 Francis Ford Coppola film Rumble Fish, and curated from an archive of cloud photography compiled during 2025, The Speed of Clouds depicts the many ways clouds inform the human experience. They indicate time through their motion, and they tell their indisputable truths through the shadows they cast, the colors they reflect, and the precipitation they generate.

They serve as companions that go largely unnoticed until they make their presence felt. The collection here was reproduced in a 20-page, 4″ x 4″ artifact highlighting the concepts of time, motion, and truth and given as gifts of appreciation to recognize the passing of 2025.

Here are few of my favorite category images that the subsequent prompts generated (Claude wrote the prompts and gave them to Gemini via API, without me helping). Neither Claude nor Gemini has ever seen his photographs. These are only from the descriptive text of his project + my category names (using the code above).

This week’s humanities reading comes from John’s booklet: the poem I Wandered Lonely as a Cloud by William Wordsworth:

I wandered lonely as a cloud That floats on high o’er vales and hills, When all at once I saw a crowd, A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.

Continuous as the stars that shine And twinkle on the milky way, They stretched in never-ending line Along the margin of a bay: Ten thousand saw I at a glance, Tossing their heads in sprightly dance.

The waves beside them danced; but they Out-did the sparkling waves in glee: A poet could not but be gay, In such a jocund company: I gazed—and gazed—but little thought What wealth the show to me had brought:

For oft, when on my couch I lie In vacant or in pensive mood, They flash upon that inward eye Which is the bliss of solitude; And then my heart with pleasure fills, And dances with the daffodils.
https://www.poetryfoundation.org/poems/45521/i-wandered-lonely-as-a-cloud

This Week By The Numbers

Total Organized Headlines: 432

This Week’s Executive Summaries

This week, I organized 432 links, and 74 of them informed the executive summaries. There are three top stories, a few fun items, and then I organize rest the executive summaries by company (and occasionally) topic, in alphabetical order.

Buried lede (down in the OpenAI Section):

According to human judges, GPT-5.2 can outperform top industry professionals on 70% of the knowledge-work tasks. That would mean it’s better at doing presentations and spreadsheets. It can also work at over 11 times faster and at 1% of the cost of a human expert.

Top Story #1: Google Is Remaking the Internet and Operating Systems

The top story this week is a continuation of the theme of predetermined user interfaces starting to go away.

Dynamic Content
The internet first broke open this concept when it started to transform the concept of a catalog or print material. After a newspaper or catalog was sent to the printing press, it was set like concrete, delivered out to the masses, and could never be changed…

The internet introduced the idea of dynamic content that could fill a template… (I see you .cfm, .asp. .jsp and .cgi people).

So a product detail page could change if new information needed to be added. Photos could be updated and swapped. A user could open the same basic template, and the template could be filled with different products, different descriptions, and different prices. This was called dynamic content, and it enabled incredible flexibility and real-time news, product, retail, and personal experiences.

The idea of a predetermined, fixed product that was shipped to all consumers…and then never able to change…started to go away, because the internet could update at any given moment.

Personalization
With social media, personalization took that to another level… individual experiences for everyone, even if they were going to the same website.

When we visit Facebook or Instagram or TikTok or Amazon, we’re seeing a completely different website or newsfeed than anyone else.

Dynamic Interfaces and User Experiences
Now we’re starting to shift beyond dynamic content, and personalization.. to dynamic interfaces.

For the past two years I’ve been saying the web browser is going away, and now we’re finally starting to how this will happen. But it’s not just the web browser. It’s operating systems and entire user interfaces no longer being predetermined. And Google’s leading the charge.

Today’s announcement from Google is a new product called Disco.
Disco combines generative AI and language interfaces with the traditional web-browsing experience to build interactive tools.
https://blog.google/innovation-and-ai/models-and-research/google-labs/gentabs-gemini-3

Instead of searching and opening tabs by hand, you describe what you want in the search bar, and Google’s tool opens websites and resources as you discuss your ideas with it. Then Disco combines everything you’ve opened and reviewed into an interactive summary.

It could be as simple as building out a game with flashcards, or it could be a multimedia demonstration, a trip planning guide, or a walkthrough of a scientific principle. It could be a shopping comparison tool…taking all of the products you’ve opened and looked at and building a table that creates a structured methodology for comparing the choices in a clear way to help you make a decision.

The key is that there is no pre-existing destination. Instead, Google combines all of the ingredients and bakes them into a cake that has never existed before.

It’s like having a completely customized experience across the user interface, the content, the timing, and the choices. It’s unique to you. It will only exist once, and it goes away once you close your window.

If you want an analogy, think of print photographs and how generations used to curate, keep, and hoard all of their memories in coffee-table albums. Now everyone just takes pictures with their phone and treats them almost like disposable assets.

The idea of a website used to be this pearl-clutching creation, with user experience, focus groups, and a lot of debate. Now a website will simply be a disposable thing that you use as needed and then move on to the next one—and it doesn’t even exist beyond the moment it’s needed.

Evan Spiegel’s video is absolutely under-rated video, a must-see, and makes this exact point:

https://blog.google/innovation-and-ai/models-and-research/google-labs/gentabs-gemini-3

Top Story #2: Is Google The PowerPoint Killer?

Almost every new image-model release lately has two key areas of improvement. One is complex composition, meaning more and more specific elements can be included in an image without the image breaking. The second is the ability to create very strong, text-heavy illustrations. The fact that AI never could get text right…and now is absolutely amazing at text (and fingers)…is something.

Beyond that, because image models are now multimodal, models like Google Gemini’s Nanabanana can actually understand the context of an image. I have an old example of this from something called Google Flamingo that I always use in my presentations. It’s a harbinger from three or four years ago of where things were heading:

Now you can draw an arrow on a map and ask Google Gemini to draw you a picture of what the arrow sees. Or you can give Gemini a scientific paper, and Gemini will create a series of illustrations and graphics to explain it visually.

Google’s classic study app, NotebookLM, has now integrated Nanobanana into its core product and can generate PowerPoints based on the resources you add to your notebook.

This is probably the breakthrough of the year in 2025. If I had to pin down one new consumer product, I’d say it would be Nana Banana Pro’s ability to create graphics.

Ethan Mollick shares this example:
“I did not expect that the PowerPoint killer would be something called Nano Banana Pro, but that is where its heading It makes the major efforts by all the other AI companies, including Microsoft, to crack PowerPoint by using python seem like a dead end ImageGen is all you need? https://x.com/emollick/status/1998520025951752278”

Top Story #3: Tech Paper: Using Z.ai to Operate an Android Phone

AutoGLM
Over the past few months, we’ve seen that open-source technology is about six to 10 months behind the leading frontier models—which means that if frontier models don’t keep innovating, they’ll be rendered obsolete by free alternatives in less than a year.

A few interns at Z.ai created an autonomous open-source agent that can use an Android phone without the user needing to open up anything, as a proof of concept. We’re almost to the point where, if Apple and Google don’t start releasing agentic phone interfaces—where you can talk to your phone and not have to open apps—pretty soon we’re going to see this happen with or without them, especially on Android. For all I know, an open-source phone will show up and just destroy both Android and iPhone.

Years ago, Apple came out with an open-source large action model that could navigate the iPhone without opening apps, but Apple has been sitting on it—probably out of fear of losing revenue from the App Store. We’ll see if Apple can pivot quickly enough to keep up. I’d give them another six to 12 months’ window before they have to make a move.

I wrote an article about this called “Apple is pulling a Braveheart and can change the way we use phones whenever they choose.”

AutoGLM
https://xiao9905.github.io/AutoGLM/

Quick Stop Before The Summaries

Before we hit the rest of the week’s summaries, let’s review an interesting end-of-year post in three graphics from Ethan Mollick:

Agentic Time on Task
The first is a graph showing the exponential growth of AI agents and their ability to focus on a task without losing their place. What’s wild is that the graph Ethan shared is actually outdated: it shows GPT Codex Max as the winner, but Claude Opus is now at almost five hours. I’ll share the new graph below Ethan’s graph, just to be sure we’re up to speed.

Ethan’s graph from March

The NEW graph from November!


This benchmark is an important one for measuring agentic reasoning. You’ll see more benchmarks later from Google and OpenAI (measuring web research effectiveness and a variety of mundane business tasks).

In this case, the METR benchmark is simply measuring how long a computer can use a tool and work on its own without getting lost. It’s kind of like how much information you can hold in your brain. My analogy to humans might be: how long can we focus on an idea without daydreaming or getting tired? It’s testing the AI’s context window and ability to handle logical steps and refinements.

When AI attempts a complex task, it often has to stop and think about how it’s doing., or troubleshoot. If you tell an AI agent to go out and buy you a plane ticket to a certain place at a certain price in a certain seat, it may have to run a bunch of nested operations: testing, coming back with ideas, trying something else, and so on. That, in general, is very difficult for a computer….especially compared to classical coding, where even the smallest error or a single typo breaks the entire program.

In this case, there’s really no program. The computer is working on its own, trying to go out there and not screw up. We’ve all seen a robot walking around before it falls over. The simplest hurdle makes the robot fall down.

Measuring the an agent’s capability to think without losing track of what it’s doing…is a valuable benchmark. As these minutes become hours, the tasks an AI can do start to evolve from a simple math problem all the way to trying to solve difficult scientific problems. The rate of improvement is what everyone is tracking, because as long as something keeps improving exponentially, the impacts are difficult to imagine.

Enterprise ROI
The second graphic from Ethan Mollick shows that three-quarters of businesses are reporting a positive return on investment from generative AI.

Workforce Adoption
The third shows that almost 50% of the U.S. workforce is using gen AI at work.

Agents

Non-Profit AI Agent Powerhouse
Anthropic, OpenAI, and Block have donated their AI agent protocols and placed them under a nonprofit called the Agentic AI Foundation.

Explaining this could sound complicated to a layperson, but at the end of the day, getting AI systems to talk to each other is important. And rather than having every connector become proprietary, three companies who designed the major connection protocols have decided to move them out from under themselves and put them under a nonprofit… so they can remain compatible, as opposed to competitors.

Anthropic built the MCP protocol, which is a very popular way to connect tools to AI systems. OpenAI has the agents.md format to give agents instructions and an understanding of projects, and Block has an open-source system called Goose that runs locally and can execute tasks and vibe code.

A good analogy might be USB connectors over the years… these are very helpful for connecting your computer to power sources, charging your phone, or connecting an external hard drive. But when Apple changes the style of their connectors…from Lightning to some new variety of USB…it’s ridiculously complicated to find a cord.

So rather than have each of the frontier models build competing connections to your email, your Google Calendar, or a third-party systems, these protocols are going to be centralized in a nonprofit. Google, Microsoft, Amazon, Cloudflare, and Bloomberg are also supportors.

The fund is going to operate under the Linux Foundation. Linux is an old-school open-source organization.

https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation https://modelcontextprotocol.io/docs/getting-started/intro https://openai.com/index/agentic-ai-foundation/ https://agents.md/ https://aaif.io/ https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation https://block.github.io/goose/ https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation

Anthropic

Broadcom reveals its mystery $10 billion customer is Anthropic
“Broadcom said in a September call that it had signed a customer that had placed a $10 billion order for custom chips. On Thursday, CEO Hock Tan revealed that the mystery customer was Anthropic. Tan on Thursday also said Anthropic had placed an additional $11 billion order with Broadcom in the company’s latest quarter.”
https://www.cnbc.com/2025/12/11/broadcom-reveals-its-mystery-10-billion-customer-is-anthropic.html

We Got Claude to Fine-Tune an Open Source LLM
“We gave Claude the ability to fine-tune language models using a new tool called Hugging Face Skills. Not just write training scripts, but to actually submit jobs to cloud GPUs, monitor progress, and push finished models to the Hugging Face Hub. This tutorial shows you how it works and how to use it yourself.”

“This isn’t a toy demo. The skill supports the same training methods used in production: supervised fine-tuning, direct preference optimization, and reinforcement learning with verifiable rewards. You can train models from 0.5B to 70B parameters, convert them to GGUF for local deployment, and run multi-stage pipelines that combine different techniques.” https://huggingface.co/blog/hf-skills-training

The HuggingFace team just got Claude Code to fully train an open LLM.
You just say something like: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots.” Claude handles the rest.

▸ Picks the best cloud GPU based on model size ▸ Loads dataset (or searches if not specified) ▸ Launches job: test or main run ▸ Tracks progress via Trackio dashboard ▸ Uploads checkpoints and final model to Hugging Face Hub https://x.com/LiorOnAI/status/1997754848255807874

Claude Code is coming to Slack, and that’s a bigger deal than it sounds | TechCrunch
“Previously, developers could only get lightweight coding help via Claude in Slack — like writing snippets, debugging, and explanations. Now they can tag @Claude to spin up a complete coding session using Slack context like bug reports or feature requests. Claude analyzes recent messages to determine the right repository, posts progress updates in threads, and shares links to review work and open pull requests.

The move reflects a broader industry shift: AI coding assistants are migrating from IDEs (integrated development environment, where software development happens) into collaboration tools where teams already work.” https://techcrunch.com/2025/12/08/claude-code-is-coming-to-slack-and-thats-a-bigger-deal-than-it-sounds/
https://claude.com/blog/claude-code-and-slack

Google

Waymo is Coming to London
“Our vehicles are now driving in London as we prepare for commercial service in 2026.” https://x.com/Waymo/status/1998075104752713981

The Waymo Foundation Model: Demonstrably Safe AI For Autonomous Driving
Waymo’s team at Google published a blog post that’s a very high-level, describing Waymo’s approach to building autonomous vehicles under what they’re calling the Waymo Foundation Model.

To be honest, I don’t really understand the point—other than PR and adding “Foundation Model” to be sure they can map their product to simulation tools from NVIDIA, Tesla, and other competitors.

It’s not a technical document. Instead, it’s a very high-level picture of the puzzle pieces behind how Waymo’s driverless vehicle training and brain.

I’ll be interested to see where this heads, because it seems to me they’re making the announcement mostly for branding and to reinforce public confidence.
https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-autonomous-driving

Google’s Deep Research Agent
“Introducing the Gemini Deep Research agent for developers. It can create a plan, spot gaps, and autonomously navigate the web to produce detailed reports.”
https://blog.google/innovation-and-ai/technology/developers-tools/deep-research-agent-gemini-api

Google has released a powerful research agent that can be run within any application using the Google API. Basically, the API runs the agent out in the cloud so you can embed the agent inside your app. Then the agent can work within your app’s needs and come back with answers.

Deep Research Agent can iterate on very complicated tasks (aka enterprise financial and scientific queries against huge datasets). It can formulate a plan, read results, identify problems, and redo searches until it gets what it needs.

This agent is so strong that Google’s deep research team has built a new benchmark that will go hand-in-hand with two other agentic benchmarks for research: Humanity’s Last Exam and BrowseComp. The new Google benchmark is called DeepSearchQA.

DeepSearchQA is a series of 900 handcrafted tasks across 17 fields, where every step depends on prior analysis. It’s not just about being able to gather facts, but also about generating in-depth answer sets and demonstrating research precision as well as retrieval recall.

One of the things I’ve noticed often in the past few months is the length of time an agent can think. This DeepSearchQA benchmark is also being touted as a diagnostic tool to look at the benefits of an agent being able to think over a long period of time.

Deep Research Agent fits into an equation as a piece that becomes larger than the sum of its parts. I think we’re going to start to see computer systems that surprise and delight us…whether it’s finding our lost luggage or mining huge files of medical records. These are going to be things that are so impactful and helpful that we’ll see impacts to daily life in the next 12 months.

Google AlphaEvolve in Google Cloud
Reading the blog post, at first., Google’s AlphaEvolve sounds almost like the sibling of Google’s Deep Research. However, it’s much more sophisticated… but also accessible via API and in the cloud.

AlphaEvolve is an algorithm design agent that uses an evolutionary approach to discovering and improving algorithms…by iterating, mutating, evolving, and looping toward a goal of improving or finding an algorithm.

AlphaEvolve is aimed at helping large biotech, pharma, logistics, financial services, and energy companies

“Proven impact at Google At Google, we have already used this technology to tackle some of our own hardest engineering problems.

Data center efficiency: AlphaEvolve found a better way to schedule tasks in our data centers, continuously recovering on average 0.7% of our global compute resources.

Gemini training: AlphaEvolve sped up a vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini’s training time.

Hardware design: It accelerated the design of our next-generation TPUs by discovering more efficient arithmetic circuits.”

“How AlphaEvolve can help businesses across industries You can apply this same engine to your own proprietary data and unique algorithmic challenges. Here are a few ways improved algorithms can potentially help different industries:

Biotech and pharma: Optimize the algorithms used for molecular simulation, which helps shorten the timelines for drug discovery and increases the success rate of new therapeutics.

Logistics and supply chain: Discover superior heuristics for routing and inventory management, helping you reduce fuel costs and build more resilient delivery networks.

Financial services: Evolve algorithmic risk models to manage complex portfolios more effectively.

Energy: Optimize load balancing on smart grids to improve stability and better integrate renewable energy sources.” https://cloud.google.com/blog/products/ai-machine-learning/alphaevolve-on-google-cloud

Google Announces the FACTS Benchmark
“It’s the industry’s first comprehensive test evaluating LLM factuality across four dimensions: internal model knowledge, web search, grounding, and multimodal inputs.”

“Large language models (LLMs) are increasingly becoming a primary source for information delivery across diverse use cases, so it’s important that their responses are factually accurate.”

“Today, we’re teaming up with Kaggle to introduce the FACTS Benchmark Suite. It extends our previous work developing the FACTS Grounding Benchmark, with three additional factuality benchmarks, including:

A Parametric Benchmark that measures the model’s ability to access its internal knowledge accurately in factoid question use-cases. A Search Benchmark that tests a model’s ability to use Search as a tool to retrieve information and synthesize it correctly. A Multimodal Benchmark that tests a model’s ability to answer prompts related to input images in a factually correct manner. We are also updating the original FACTS grounding benchmark with Grounding Benchmark – v2, an extended benchmark to test a model’s ability to provide answers grounded in the context of a given prompt.”

FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality – Google DeepMind
https://deepmind.google/blog/facts-benchmark-suite-systematically-evaluating-the-factuality-of-large-language-models/

Google Glass Is Back?
Google to launch first of its AI glasses in 2026 Google on Monday said it plans to launch the first of its AI-powered glasses in 2026, as the tech company ramps up its efforts to compete against Meta in a heating consumer market for AI devices. The company said it plans to release audio-only glasses with its Gemini AI assistant and glasses that will include an in-lens display. https://www.cnbc.com/2025/12/08/google-ai-glasses-launch-2026.html

Throwback to Rori and Chloe wearing my Google Glasses back in 2013!

The Android Show: New features for Galaxy XR and a look at future devices https://blog.google/products-and-platforms/platforms/android/android-show-xr-edition-updates/

Google Web Partnerships
New Google web ecosystem tools and partnerships “We’re expanding features like Preferred Sources to connect people to the web, and we’re updating our partnerships with news publications and creators for the AI era.”
https://blog.google/products-and-platforms/products/search/tools-partnerships-web-ecosystem/

Announcing Model Context Protocol (MCP) support for Google services
“With the recent launch of Gemini 3, we have the state-of-the-art reasoning to help you learn, build, and plan anything. But for AI to truly be an “agent”, to pursue goals and solve real-world problems on behalf of users, it needs more than just intelligence; it needs to reliably work with tools and data.

Anthropic’s Model Context Protocol (MCP), often likened to a “USB-C for AI”, has quickly become a common standard to connect AI models with data and tools. MCP enables AI applications to execute the complex multi-step tasks it takes to solve real world problems. However, implementing Google’s existing community-built servers often requires developers to identify, install, and manage individual, local MCP servers or deploy open-source solutions–placing the burden on developers, and often leading to fragile implementations.

Today we’re announcing the release of fully-managed, remote MCP servers. Google’s existing API infrastructure is now enhanced to support MCP, providing a unified layer across all Google and Google Cloud services. Developers can now simply point their AI agents or standard MCP clients like Gemini CLI to a globally-consistent and enterprise-ready endpoint for Google and Google Cloud services.” https://cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services/

Government

The War Department Unleashes Google AI on New GenAI.mil Platform > U.S. Department of War > Release | U.S. Department of War
“The War Department today announced the launch of Google Cloud’s Gemini for Government as the first of several frontier AI capabilities to be housed on GenAI.mil, the Department’s new bespoke AI platform. This initiative cultivates an “AI-first” workforce, leveraging generative AI capabilities to create a more efficient and battle-ready enterprise. Additional world-class AI models will be available to all civilians, contractors, and military personnel, delivering on the White House’s AI Action Plan announced earlier this year.”

“The first instance on GenAI.mil, Gemini for Government, empowers intelligent agentic workflows, unleashes experimentation, and ushers in an AI-driven culture change that will dominate the digital battlefield for years to come. Gemini for Government is the embodiment of American AI excellence, placing unmatched analytical and creative power directly into the hands of the world’s most dominant fighting force.”

“The launch of GenAI.mil stands as a testament to American ingenuity, driven by the AI Rapid Capabilities Cell within the War Department’s Office of Research & Engineering. Their achievement directly embodies the Department’s core tenets of reviving the warrior ethos, rebuilding American military capabilities, and re-establishing deterrence through technological dominance and uncompromising grit.”
https://www.war.gov/News/Releases/Release/Article/4354916/the-war-department-unleashes-ai-on-new-genaimil-platform

This is pretty obviously written by AI… I would love to see the prompt. I asked Claude Sonnet 4.5 to read the article and tell me its thoughts:

The tell-tale signs are “unleashes” (twice), “dominant/dominance” (at least 5 times), “battlefield/fighting force” references when talking about… office productivity tools, the “Manifest Destiny” line, “pushing all our chips in,” and my personal favorite: describing Google’s Gemini chatbot as placing “unmatched analytical and creative power directly into the hands of the world’s most dominant fighting force.”

It reads like someone asked AI to write a press release “but make it sound REALLY tough.”

Trump signs executive order seeking to block state laws on AI
“Congressional efforts to regulate AI at the federal level this year have fallen short.

The order aims “to sustain and enhance the United States’ global AI dominance through a minimally burdensome national policy framework for AI,” according to text published on the White House website.

At at signing ceremony in the Oval Office, Trump said AI companies “want to be in the United States, and they want to do it here, and we have big investment coming. But if they had to get 50 different approvals from 50 different states, you could forget it.”

The order directs Attorney General Pam Bondi to create an “AI Litigation Task Force” within 30 days whose “sole responsibility shall be to challenge State AI laws” that clash with the Trump administration’s vision for light-touch regulation.

In addition, the order instructs Commerce Secretary Howard Lutnick to identify existing state laws that “require AI models to alter their truthful outputs,” echoing earlier Trump administration efforts to prevent what it calls “woke AI.” States found to have these and other “onerous” laws may have to enter into agreements not to enforce those statutes in order to receive discretionary federal funding.”
https://www.nbcnews.com/tech/tech-news/trump-signs-executive-order-seeking-ban-state-laws-ai-rcna248741

Legal and Ethics

Are image tools loosening their restrictions?
“Curious why Nano Banana Pro, ChatGPT image gen etc are all suddenly cool with generating celebrity likeness? Like what changed from days of deepfake fear-mongering and fears of legal repercussions? Did fingerprinting get good or is there a new legal argument for allowing this?” https://x.com/bilawalsidhu/status/1998461802397458626

Great Essay About the Impact on AI for Clients/Lawyers
LLMs Make Legal Advice Lossy “Clients can use LLMs to compress my advice into summaries without telling me. Pressure to keep writing short and plain is good, but they’re losing my careful choice of how much to generalize rules and realities. They’re also interrupting how I choose to teach and reinforce useful vocabulary.”

“With many-megapixel cameras in our pockets and networks so fast that streaming hiccups piss off pre-teens, it’s easy to forget how frail digital imagery used to be. When storage cost real money and the Internet was slow, we took photos at much lower resolution. Then we used formats like JPEG to strip detail in clever ways that preserved overall impressions, for the sake of speedy sharing. But if you zoomed in on an image, or even just looked closer, you’d often find blurry blobs, jagged edges, and cryptic artifacts where nuance should have been. Such were the losses of so-called lossy compression.

In the real world, there was no enhance button. Sometimes a photo just wouldn’t show you what you needed to see. So you had to search around for a better image and wait for it to download. Or you had to go back to the source.

Something similar is happening with advice to some of my clients. Instead of reading what I write for them, they’re pasting it into chatbots for summaries, then reading those summaries instead. It’s fast. It’s cheap. It’s at least notionally private. They don’t have to tell me they’re doing it.” https://writing.kemitchell.com/2025/12/07/LLMs-Make-Legal-Advice-Lossy

New York Times sues Perplexity for ‘illegal’ copying of millions of articles
“The New York Times sued an embattled artificial intelligence startup on Friday, accusing the firm of illegally copying millions of articles. The newspaper alleged Perplexity AI had distributed and displayed journalists’ work without permission en masse.

The Times said that Perplexity AI was also violating its trademarks under the Lanham Act, claiming the startup’s generative AI products create fabricated content, or “hallucinations”, and falsely attribute them to the newspaper by displaying them alongside its registered trademarks.

The newspaper said that Perplexity’s business model relies on scraping and copying content, including paywalled material, to power its generative AI products. Other publishers have made similar allegations.” https://www.theguardian.com/technology/2025/dec/05/new-york-times-perplexity-ai-lawsuit

Meta

ElevenLabs will power AI audio across Meta’s platforms
Meta is outsourcing audio AI from ElevenLabs. It’s the laziest press release ever… but you get the idea. “From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform enables global creators, businesses, and enterprises to build with voice, music, and sound at scale.” (That’s it!)
https://elevenlabs.io/blog/meta

Meta presents OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Meta came out with a paper showing the ability to build continuity across ten one-minute independent videos that can be stitched together to form cohesive long-form video—from a variety of angles, including close-up shots and wide shots—using the same people or objects. This is essentially the beginning of a storyboarding process.

It’s not the only example, but just as ByteDance is leveraging their content to build multimodal models, it’s interesting to see Meta continue to publish papers in this space. If you’re a video creator, this is worth a skim.
https://zhaochongan.github.io/projects/OneStory

Nvidia

Nvidia Gets US Approval for H200 AI Chip Exports to China
Nvidia got the green light from the Trump administration to sell H200 AI chips to China, with the United States getting a 25% cut. The criticism I’ve seen in the industry is that, essentially, Jensen was able to finagle a way around security concerns in exchange for putting a tax on the chips. However, he went about it through the executive branch, as opposed to taxation or policy through Congress.

The Wall Street Journal compares this unfavorably to the Native Americans selling Manhattan, asking why the President would give away a chief technological advantage to an adversary.

The WSJ bluntly writes “Mr. Trump is essentially trading national security for pennies on the dollar.”

I have a clip of Scott Galloway walking through it, and I’ll run it through Gemini and make a quick infographic (below).

Gemini Prompt: Can you pull a transcript of Scott and Kara’s commentary about Jensen and NVIDIA selling chips to China? It starts around the 6:18 mark and goes through 12:44. Please make an informational graphic of the key criticisms.

https://www.bloomberg.com/news/articles/2025-12-08/nvidia-set-to-win-us-approval-to-export-h200-ai-chips-to-china

Nvidia-backed Starcloud trains first AI model in space
“Last month, the Washington-based company launched a satellite with an Nvidia H100 graphics processing unit, sending a chip into outer space that’s 100 times more powerful than any GPU compute that has been in space before. Starcloud was able to train and run NanoGPT, a large language model created by OpenAI founding member Andrej Karpathy, on the H100 chip in orbit using the complete works of Shakespeare. This led the model, the first to be trained in space, to speak in Shakespearean English.

The company’s Starcloud-1 satellite is also now running and querying responses from Gemma, an open large language model from Google based on the company’s Gemini models, in orbit, marking the first time in history that an LLM has been run on a high-powered Nvidia GPU in outer space, CNBC has learned.

“Greetings, Earthlings! Or, as I prefer to think of you — a fascinating collection of blue and green,” reads a message from the recently launched satellite.

“Let’s see what wonders this view of your world holds. I’m Gemma, and I’m here to observe, analyze, and perhaps, occasionally offer a slightly unsettlingly insightful commentary. Let’s begin!” the model wrote.” https://www.cnbc.com/2025/12/10/nvidia-backed-starcloud-trains-first-ai-model-in-space-orbital-data-centers.html

OpenAI

OpenAI Releases GPT-5.2 for Complex Corporate and Scientific Tasks
OpenAI posted an incredibly detailed product release for GPT-5.2. It’s a strong contender for story of the week. It’s one of the most comprehensive and detailed updates I’ve ever seen for a model release. I’m going to walk through it from top to bottom briefly and summarize it in layperson terms.
https://openai.com/index/introducing-gpt-5-2/

It’s being touted as the best model for professionals to help with knowledge work.

Important note: Most of us use chatbots through either an app or a website, which creates a tension…because when you hear “professional knowledge work,” you probably think about corporate policies and not using language models for business work. So, to some extent, this entire press release is talking about API-driven enterprise models that are white-labeled for big companies with internal systems.

That said, it’s still a very powerful tool for personal use. GPT-5.2 is designed to be really good at creating spreadsheets, building presentations, writing code, understanding what’s in an image, and having a large memory to hold a ton of context (huge prompts and extended conversations). It’s also very good at using tools.

A few months ago, OpenAI launched a benchmark called GDPVal, which spans a variety of knowledge work across 44 different jobs.

GPT-5.2 is the top model on the benchmark and is actually outperforming humans.

According to human judges, it can outperform top industry professionals on 70% of the knowledge-work tasks. That would mean it’s better at doing presentations and spreadsheets. It can also work at over 11 times faster and at 1% of the cost of a human expert.

OpenAI has posted several examples comparing GPT-5 and GPT-5.2, with “before and after” demonstrations of Excel sheets for capital expenses or project-management Gantt charts. These are really strong improvements that are easy to spot in the examples.

GPT-5.2 is also a new leader on the SWE-Bench Pro benchmark, which measures real-world software engineering skills. OpenAI shared some embedded interactive demonstrations coded by GPT-5.2 that are really impressive.

One is an interactive ocean-wave simulation tool, including sliders and parameters that can be changed in real time and viewed within a browser, coded by GPT-5.2. There’s also a cute interactive holiday-card building engine that’s embedded and fully functional… withini the product announcement post… and a typing game where words rain down and you try to type them as fast as you can before they hit the ground…all coded through simple prompts (that are included for trying yourself). These are starkly simple prompts that anybody could think of and input. Nothing difficult at all.

In addition to these strengths, the memory—or context—of GPT-5.2 is extremely long and allows it to integrate information spread across extremely long documents.

One example of the context window shows 100% accuracy on a four “needle in the haystack” search test against 256,000 tokens. That equates to about 200,000 words, or maybe 500 pages of text, within a single prompt. Imagine a 500-page prompt. This could also equate to about 200 to 400 images, depending on their size, being pasted into the prompt.

GPT-5.2 also has a very strong vision ability, meaning it can understand and interpret complex dashboards, screenshots, technical diagrams, visual reports, and workflows in finance, engineering, design, or even software concepts in customer support, where visual information is really important. It can understand computer screenshots with almost 90% accuracy. If you give it a very complicated image of a computer chip, for example, it’s significantly more accurate in labeling the parts than GPT-5.

This new version of GPT is also very good at using tools. So in customer support for retail or telecommunications, the model can be connected to other tools and actually work across different systems, run analysis, and break down solutions. OpenAI shared an example of an agent that can handle flight delays, missed connections, missing bags, and also hotel and seating arrangements for travel all in one request. It can break everything down into components, solve them separately, and put them back together into an output that’s effective… better and faster than a human travel agent doing the work.

This new version of GPT is also very strong at science and math. Across the benchmarks, GPT-5.2 is now the leader. It’s also very good at general reasoning, and it’s the first model to ever cross the 90% threshold on the famous ARC-AGI 2 test.

There’s some small controversy around ARC this week, but it doesn’t change much re GPT-5.2: “Yes, there is a leak. I had investigated this. Some of the ARC-AGI-1 public evaluation examples can be found in the ARC-AGI-2 training examples. So training on both ARC-AGI-1 and ARC-AGI-2 training data is cheating as it leads to crazy good accuracy for ARC-AGI-1.” https://x.com/jm_alexia/status/1998487516182467055

ChatGPT’s ‘Adult Mode’ Is Coming in 2026
“At this point, the promised “adult mode” is irrevocably linked to Altman’s specific promise of allowing ChatGPT to produce “erotica.” But Altman later clarified that the idea is to give adult users more “freedom” in how they interact with the chatbot, including allowing it to develop more of a customized “personality” over the course of conversations with the user.”

“According to The Verge, Fidji Simo told reporters the company is still testing its age verification technology and wants to ensure it can accurately identify teens and not misidentify adults before officially rolling out the split experience.”
https://gizmodo.com/chatgpts-adult-mode-is-coming-in-2026-2000698677

GPT’s “Router” Deciding “Auto” or “Thinking” Mode Is Confusing People
“The GPT-5 Auto router casts a long shadow over AI perceptions. So many examples of “”ChatGPT got X wrong”” are really ‘ChatGPT-5 Instant got things wrong,’ leading to beliefs about the state of AI that aren’t true. Which model you get could be clearer & better explained for all.” https://x.com/emollick/status/1998838007609119010

OpenAI Hires Slack CEO as New Chief Revenue Officer
“Denise has experience running large businesses, understands customers deeply, and scaled products that people in the workplace love to use. Most recently, she served as CEO of Slack, where she led the company through its integration with Salesforce and helped redefine how millions of people use AI to work more efficiently and stay better connected. Prior to Slack, Denise spent more than a decade at Salesforce building and leading global sales organizations serving some of the company’s largest and most complex customers.”
https://openai.com/index/openai-appoints-denise-dresser/ https://www.wired.com/story/slack-ceo-denise-dresser-joins-openai-chief-revenue-officer/

Disney investing $1 billion in OpenAI, will allow characters on Sora
“As part of this three-year licensing agreement, Sora will be able to generate short, user-prompted social videos that can be viewed and shared by fans, drawing on more than 200 Disney, Marvel, Pixar and Star Wars characters. Agreement will make a selection of these fan-inspired Sora short form videos available to stream on Disney+. Disney and OpenAI affirm a shared commitment to responsible use of AI that protects the safety of users and the rights of creators. Alongside the licensing agreement, Disney will become a major customer of OpenAI, using its APIs to build new products, tools, and experiences, including for Disney+, and deploying ChatGPT for its employees. As part of the agreement, Disney will make a $1 billion equity investment in OpenAI, and receive warrants to purchase additional equity.” https://www.cnbc.com/2025/12/11/disney-openai-sora-characters-video.html https://thewaltdisneycompany.com/news/disney-openai-sora-agreement/

OpenAI testing new Image-2 models on LM Arena
“OpenAI is set to debut new image generation models, Image-2 and Image-2-mini, promising higher detail and colour accuracy for ChatGPT and creative tools.” https://www.testingcatalog.com/openai-testing-new-image-2-models-on-lm-arena/

Instacart and OpenAI partner on AI shopping experiences | OpenAI
“With the Instacart app in ChatGPT, users can browse groceries, build their cart, and check out seamlessly through OpenAI Instant Checkout without ever leaving the chat. The new experience builds on a longstanding partnership between OpenAI and Instacart to facilitate AI-powered shopping.” https://openai.com/index/instacart-partnership/

Instacart is the first app to offer a checkout experience directly within ChatGPT, powered by the Agentic Commerce Protocol. https://openai.com/index/buy-it-in-chatgpt/

OpenAI Turns Ten Years Old
Ten years “Our mission is to ensure that AGI benefits all of humanity. We still have a lot of work in front of us, but I’m really proud of the trajectory the team has us on. We are seeing tremendous benefits in what people are doing with the technology already today, and we know there is much more coming over the next couple of years.”
https://openai.com/index/ten-years/

Robotics

Figure CEO posts video of Figure running
He doesn’t share anywhere but Twitter… so here’s a copy on YouTube.

Shopify

Shopify merchants can now sell products through AI chatbot
“Shopify merchants’ products can now be discovered on artificial intelligence (AI) platforms like ChatGPT, Perplexity, and Microsoft Copilot, after the Canadian e-commerce giant announced agentic storefronts in its latest update. ”
https://betakit.com/shopify-merchants-can-now-sell-products-through-ai-chatbots/

Z.ai

GLM-4.6V Z.ai released a multimodal vision + coding agent model called GLM-4.6v
It can critique designs and provide product feedback. GLM-4.6v can read very sloppy handwriting and understand handwritten math equations. It can also navigate user interfaces and use tools, serving as a very effective agent that can see and interact with computer systems and websites.

It has native multimodal tool use, which means it can look at images, screenshots, and documents, and those items can be passed directly to tools via API or services without being converted to text in advance. It can also understand results returned by tools, like statistical charts, screenshots, or product images, and incorporate them into its reasoning chain without having to translate things back and forth.

It’s strong at using the web and searching the web. It can build front-end user interfaces and visual, interactive designs. It can handle 150 pages of complex documents at one time in the prompt window, or 200 slide pages, or a one-hour-long video as a prompt reference. It can handle financial report analysis. It can understand videos. And it’s all open source. It’s an incredible tool for creative app developers.
https://z.ai/blog/glm-4.6v

Full Executive Summaries with Links, Generated by Claude 4

Nano Banana Pro emerges as unexpected PowerPoint competitor using image generation
A tool called Nano Banana Pro is challenging PowerPoint by focusing on image generation rather than code-based approaches that Microsoft and other AI companies have pursued. This suggests that visual AI capabilities, not programming automation, may be the key to revolutionizing presentation software in ways the tech giants haven’t anticipated.

I did not expect that the PowerPoint killer would be something called Nano Banana Pro, but that is where its heading It makes the major efforts by all the other AI companies, including Microsoft, to crack PowerPoint by using python seem like a dead end ImageGen is all you need? https://x.com/emollick/status/1998520025951752278

Google launches Disco browser with AI that builds custom web apps
Google’s new Disco browser uses Gemini 3 to automatically create interactive web applications based on your open tabs and browsing tasks, requiring no coding skills. This represents a shift from traditional browsing to AI-generated, task-specific tools that adapt to complex online workflows. Early testers are using it to build custom apps for meal planning, trip research, and educational content, with the browser linking all generated elements back to original web sources.

Disco is Google’s new generative AI web app experience https://blog.google/innovation-and-ai/models-and-research/google-labs/gentabs-gemini-3/

AI industry matures while maintaining exponential growth pace in 2025
Companies are reporting positive returns on AI investments as generative AI transitions from experimental technology to established industry, though deployment remains in early stages and AI systems continue to exhibit unpredictable “jagged” performance across different tasks.

Summarizing 2025 in AI in a tweet 1) No sign of a slowdown in exponential pace of gains 2) Jaggedness remains the main issue of AI 3) Early days for deployment, but many companies reporting positive ROI 4) GenAI became an industry, with industry-level impacts 5) AI is still weird https://x.com/emollick/status/1997371953267585509

AI agents now complete tasks taking humans 1 hour, doubling every 7 months
METR researchers found that frontier AI models can reliably complete tasks that take human experts up to one hour, with this capability doubling approximately every 7 months over the past six years. If this exponential trend continues, AI agents could autonomously handle month-long software projects within a decade. The study measured 50% success rates across diverse multi-step tasks, showing current models excel at sub-4-minute tasks but struggle beyond 4 hours.

Measuring AI Ability to Complete Long Tasks – METR https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Tech giants launch Agentic AI Foundation under Linux Foundation
Major AI companies including Anthropic, OpenAI, and Block have created the Agentic AI Foundation to ensure autonomous AI systems develop through open-source collaboration rather than proprietary silos. The foundation launches with three key projects: Anthropic’s Model Context Protocol (which connects AI to external data and has 10,000+ active servers), Block’s goose agent framework, and OpenAI’s AGENTS.md standard (adopted by 60,000+ projects). This represents a significant industry commitment to preventing AI agent technology from being controlled by a few companies, with backing from Amazon, Google, Microsoft, and other tech leaders.

Agentic AI Foundation — advancing open-source agentic AI:”” / X https://x.com/gdb/status/1998897086079832513

Agentic AI Foundation (AAIF) https://aaif.io/

Anthropic is donating the Model Context Protocol to the Agentic AI Foundation, a directed fund under the Linux Foundation. In one year, MCP has become a foundational protocol for agentic AI. Joining AAIF ensures MCP remains open and community-driven. https://x.com/AnthropicAI/status/1998437922849350141

Block – Block, Anthropic, and OpenAI Launch the Agentic AI Foundation https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation

Donating the Model Context Protocol and establishing the Agentic AI Foundation \ Anthropic https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation

Linux Foundation Announces the Formation of the Agentic AI Foundation (AAIF), Anchored by New Project Contributions Including Model Context Protocol (MCP), goose and AGENTS.md https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation

We’re donating MCP to the @linuxfoundation and launching the Agentic AI Foundation with @OpenAI, @blocks, @AWS, @Bloomberg, @Cloudflare, @Google, and @Microsoft. MCP went from internal project to industry standard in a year. Now it gets the long-term stewardship it deserves.”” / X https://x.com/mikeyk/status/1998456026136457532

Claude can now fully train open-source language models automatically
HuggingFace released a new “skill” that lets Claude handle the entire AI model training process—from selecting cloud hardware to monitoring progress—requiring only a simple text command like “Fine-tune Qwen3-0.6B on this dataset.” This marks a shift from AI writing code to AI executing complete technical workflows, potentially democratizing access to custom model development. The system supports production-grade training methods and can handle models up to 70 billion parameters on Hugging Face’s cloud infrastructure.

The HuggingFace team just got Claude Code to fully train an open LLM. You just say something like: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots.” Claude handles the rest. ▸ Picks the best cloud GPU based on model size ▸ Loads dataset (or searches if not specified) ▸ https://x.com/LiorOnAI/status/1997754848255807874

We Got Claude to Fine-Tune an Open Source LLM https://huggingface.co/blog/hf-skills-training

Broadcom reveals its mystery $10 billion customer is Anthropic https://www.cnbc.com/2025/12/11/broadcom-reveals-its-mystery-10-billion-customer-is-anthropic.html

Claude Code now integrates with Slack for automated coding workflows
Anthropic launched Claude Code integration with Slack, letting developers tag @Claude to automatically create coding sessions from chat context like bug reports. This represents a strategic shift from AI coding tools living in development environments to embedding directly in team collaboration platforms, potentially reshaping how software teams move from discussion to implementation without switching apps.

Claude Code and Slack | Claude https://claude.com/blog/claude-code-and-slack

Claude Code is coming to Slack, and that’s a bigger deal than it sounds | TechCrunch https://techcrunch.com/2025/12/08/claude-code-is-coming-to-slack-and-thats-a-bigger-deal-than-it-sounds/

Waymo announces plans to launch self-driving car service in London by 2026
Google’s autonomous vehicle unit is expanding internationally for the first time, with test vehicles already operating on London streets. This marks a significant milestone as Waymo moves beyond its US operations in Phoenix and San Francisco, potentially validating that self-driving technology can adapt to different traffic patterns, regulations, and road conditions globally.

Hello London! 👋 Our vehicles are now driving in London as we prepare for commercial service in 2026. https://x.com/Waymo/status/1998075104752713981

So excited for @Waymo coming to London in 2026!!”” / X https://x.com/demishassabis/status/1998825670869397802

Waymo achieves large-scale deployment of fully autonomous driving systems
The company has successfully implemented what experts consider the most advanced real-world application of embodied AI, using extensive autonomous driving data to create safer roads through rigorous engineering rather than just laboratory demonstrations.

Waymo’s system, fueled by careful collection of a large volume of fully autonomous data, is the most advanced, large-scale application of embodied AI today. Very proud to see this level of engineering rigor tackling safe autonomous driving making the roads safer for everyone https://x.com/JeffDean/status/1998432670376935656

Google releases Gemini Deep Research agent for developers via API
Google’s new autonomous research agent can plan investigations, identify knowledge gaps, and navigate websites to create comprehensive reports, achieving state-of-the-art performance on complex research benchmarks. Unlike basic AI assistants, this agent iteratively refines its search strategy and synthesizes information from multiple sources, making it particularly valuable for financial due diligence and scientific research where thoroughness matters more than speed. Early users in biotech and finance report significant productivity gains in preliminary research tasks.

Introducing the Gemini Deep Research agent for developers. It can create a plan, spot gaps, and autonomously navigate the web to produce detailed reports. 🧵 https://x.com/GoogleDeepMind/status/1999165701811015990

Build with Gemini Deep Research https://blog.google/innovation-and-ai/technology/developers-tools/deep-research-agent-gemini-api/

To measure these capabilities, we’re open-sourcing DeepSearchQA, a new benchmark to evaluate agents on complex web search tasks. Deep Research achieves state-of-the-art performance on this benchmark, as well as on the full Humanity’s Last Exam set (reasoning & knowledge), and https://x.com/GoogleDeepMind/status/1999165706231820297

Google launches AlphaEvolve, an AI agent that writes better algorithms
Google Cloud released AlphaEvolve, which uses Gemini models to automatically improve existing code through evolutionary processes, testing mutations against user-defined benchmarks. The system has already delivered measurable gains at Google, recovering 0.7% of global compute resources in data centers and speeding up Gemini training by 1%. Now available in private preview, it targets complex optimization problems across industries like biotech, logistics, and finance where algorithmic improvements can yield significant business value.

AlphaEvolve on Google Cloud | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/alphaevolve-on-google-cloud

Google creates first comprehensive test for AI model accuracy
The FACTS Benchmark Suite evaluates how well AI language models handle factual information across four key areas: their built-in knowledge, web searching abilities, using provided sources, and processing images with text. This addresses a critical gap in AI evaluation, as current models often generate convincing but incorrect information, making systematic accuracy testing essential for business and educational applications.

We’ve developed the FACTS Benchmark Suite with @GoogleResearch. 📊 It’s the industry’s first comprehensive test evaluating LLM factuality across four dimensions: internal model knowledge, web search, grounding, and multimodal inputs. https://x.com/GoogleDeepMind/status/1998831084277313539

Google plans to launch AI-powered glasses in 2026 to compete with Meta
Google announced it will release both audio-only glasses with Gemini AI assistant and display-equipped glasses showing navigation and translations, marking its return to smart eyewear after previous failures. The move directly challenges Meta’s surprisingly successful Ray-Ban partnership, as the AI wearables market heats up with companies like Snap and Alibaba also entering the space. Google is partnering with Samsung, Gentle Monster, and Warby Parker (with a $150 million commitment) to avoid past mistakes of high costs and limited functionality.

POST CHLOE GOOGLE GLASSES Google to launch first of its AI glasses in 2026 https://www.cnbc.com/2025/12/08/google-ai-glasses-launch-2026.html

Google launches preferred sources feature and AI partnerships with major news publishers
Google is rolling out tools that let users customize news feeds to prioritize favorite outlets and highlight subscription content, while piloting AI partnerships with publishers like The Guardian and Washington Post. The moves address growing concerns about AI’s impact on web traffic by creating new revenue streams for publishers and giving users more control over their information sources. Early data shows users click through to preferred sources twice as often, suggesting the features could help maintain the web’s economic ecosystem.

New Google web ecosystem tools and partnerships https://blog.google/products-and-platforms/products/search/tools-partnerships-web-ecosystem/

Google releases Gemini 3 Pro with advanced visual reasoning capabilities
Google’s new Gemini 3 Pro model can analyze complex documents, understand spatial relationships, navigate computer screens, and process video at high frame rates—going beyond simple image recognition to perform multi-step visual reasoning. The model outperforms human baselines on visual reasoning benchmarks and can convert handwritten 18th-century documents into structured tables or transform mathematical images into precise code. This represents a significant leap toward AI systems that can truly understand and interact with visual information across professional fields like education, medicine, and finance.

Gemini 3 Pro: the frontier of vision AI https://blog.google/innovation-and-ai/technology/developers-tools/gemini-3-pro-vision/

Google launches managed MCP servers connecting AI agents to enterprise services
Google announced fully-managed Model Context Protocol servers that let AI agents directly access Google Cloud services like BigQuery, Maps, and Kubernetes without developers needing to build custom integrations. This matters because it removes technical barriers that previously made it difficult for AI agents to work with real enterprise data and infrastructure. The move positions Google to compete with other cloud providers by making their services more accessible to the growing market of AI applications that need to perform multi-step tasks using live business data.

Announcing official MCP support for Google services | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services/

War Department launches AI platform for military personnel nationwide
The U.S. military has deployed GenAI.mil, a dedicated artificial intelligence platform designed specifically for defense personnel to access AI tools while maintaining security protocols. This represents the first government-wide AI system built exclusively for military use, potentially accelerating AI adoption across defense operations while addressing longstanding concerns about data security in sensitive military applications.

The War Department Unleashes AI on New GenAI.mil Platform > U.S. Department of War > Release | U.S. Department of War https://www.war.gov/News/Releases/Release/Article/4354916/the-war-department-unleashes-ai-on-new-genaimil-platform/

Trump signs order to block state AI laws and create federal framework
The executive order directs the Justice Department to challenge state AI regulations while threatening to withhold federal funding from states with “onerous” AI laws. This represents a significant shift toward federal preemption of AI governance, as congressional efforts to create national AI standards have repeatedly failed. Critics view it as an attempt to block meaningful AI regulation altogether, while supporters argue a single federal standard is needed to maintain America’s competitive edge over China.

Trump signs executive order seeking to block state laws on AI https://www.nbcnews.com/tech/tech-news/trump-signs-executive-order-seeking-ban-state-laws-ai-rcna248741

AI image generators quietly drop celebrity likeness restrictions
Several major AI platforms have recently relaxed their policies against generating celebrity images, marking a significant shift from earlier strict prohibitions driven by deepfake concerns. This change suggests either improved detection technology or new legal frameworks have emerged, though the specific reasons remain unclear. The shift could reshape how AI companies balance creative freedom with potential misuse risks.

Curious why Nano Banana Pro, ChatGPT image gen etc are all suddenly cool with generating celebrity likeness? Like what changed from days of deepfake fear-mongering and fears of legal repercussions? Did fingerprinting get good or is there a new legal argument for allowing this?”” / X https://x.com/bilawalsidhu/status/1998461802397458626

Lawyers lose control as clients secretly use AI to summarize legal advice
Clients are increasingly using chatbots to compress attorney guidance without disclosure, stripping away lawyers’ careful decisions about how much detail to include and which technical terms to teach. This “lossy compression” of legal advice removes nuanced explanations that help clients make informed decisions and understand industry language. The trend mirrors how early digital photos lost critical detail when compressed—except here, the missing information could affect business and legal outcomes.

LLMs Make Legal Advice Lossy — /dev/lawyer https://writing.kemitchell.com/2025/12/07/LLMs-Make-Legal-Advice-Lossy

New York Times sues Perplexity AI for copying millions of articles
The lawsuit accuses the $20 billion AI startup of illegally scraping paywalled content and creating fake articles falsely attributed to the Times. This marks the latest escalation in publishers’ fight against AI companies using copyrighted material without permission, with Perplexity now facing similar suits from at least eight other major publishers including Dow Jones, Forbes, and Reddit.

New York Times sues AI startup for ‘illegal’ copying of millions of articles | AI (artificial intelligence) | The Guardian https://www.theguardian.com/technology/2025/dec/05/new-york-times-perplexity-ai-lawsuit

ElevenLabs partners with Meta to power audio across Instagram and Horizon
The voice AI company will provide dubbing, music generation, and character voices to billions of Meta users, marking a major expansion of AI-generated audio into mainstream social platforms beyond the current text and image AI features.

ElevenLabs is partnering with @Meta to power expressive, scalable audio across Instagram, Horizon, and more – bringing natural and diverse audio to billions of users. From dubbing Reels in local languages, to generating music and character voices in Horizon, ElevenLabs platform https://x.com/elevenlabsio/status/1999163506743038408?s=20

ElevenLabs is partnering with Meta https://elevenlabs.io/blog/meta

Meta’s OneStory generates minute-long videos with consistent storytelling across multiple shots
Meta developed OneStory, an AI system that creates coherent multi-shot videos up to one minute long by treating video generation as a sequential storytelling task. Unlike existing methods that struggle with narrative consistency across different camera angles and scenes, OneStory uses an adaptive memory system to maintain character and environment consistency while following complex, evolving storylines. The system outperformed all baseline methods on narrative coherence metrics and can generate videos from either text descriptions or initial images.

Meta presents OneStory Coherent Multi-Shot Video Generation with Adaptive Memory https://x.com/_akhaliq/status/1998760879261888814

OneStory https://zhaochongan.github.io/projects/OneStory/

Trump allows Nvidia to sell advanced H200 chips to China for 25% revenue cut
President Trump approved Nvidia’s export of H200 AI chips to vetted Chinese customers, with the U.S. government taking a 25% cut of sales revenue. This marks a significant shift from previous export restrictions, as the H200 is more powerful than chips previously allowed for China export. Chinese President Xi Jinping reportedly responded positively to the arrangement, which Trump says will support American jobs while allowing U.S. chipmakers to compete in the crucial Chinese market.

Nvidia Gets US Approval for H200 AI Chip Exports to China – Bloomberg https://www.bloomberg.com/news/articles/2025-12-08/nvidia-set-to-win-us-approval-to-export-h200-ai-chips-to-china

Trump: Nvidia can sell H200 AI chips to China if U.S. gets 25% cut https://www.cnbc.com/2025/12/08/trump-nvidia-h200-sales-china.html

Starcloud trains first AI model in space using Nvidia chips
The startup successfully trained and ran AI models on an orbiting satellite with an H100 chip 100 times more powerful than previous space computing hardware. This breakthrough demonstrates orbital data centers could address Earth’s growing energy and infrastructure constraints, with Starcloud planning massive 5-gigawatt space facilities powered by constant solar energy. The achievement marks a new frontier as tech giants including Google pursue similar space-based computing initiatives.

Nvidia-backed Starcloud trains first AI model in space, orbital data centers https://www.cnbc.com/2025/12/10/nvidia-backed-starcloud-trains-first-ai-model-in-space-orbital-data-centers.html

We have just used the @Nvidia H100 onboard Starcloud-1 to train the first LLM in space! We trained the nano-GPT model from Andrej @Karpathy on the complete works of Shakespeare and successfully ran inference on it. We have also run inference on a preloaded Gemma model, and we https://x.com/AdiOltean/status/1998769997431058927

OpenAI releases GPT-5.2 in accelerated response to Google competition
OpenAI launched GPT-5.2 this week, moving up its December release timeline after CEO Sam Altman declared “code red” over Google’s Gemini 3 model topping AI leaderboards last month. The new model ranks second on web development benchmarks and shows significant improvements in long-context processing over GPT-5.1, marking OpenAI’s strategic pivot from flashy features to core performance improvements as the AI race intensifies.

Introducing GPT-5.2 | OpenAI https://openai.com/index/introducing-gpt-5-2/

oai_5_2_system-card.pdf https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf

OpenAI is getting ready to launch GPT-5.2 soon | The Verge https://www.theverge.com/report/838857/openai-gpt-5-2-release-date-code-red-google-response

Using GPT-5.2 | OpenAI API https://platform.openai.com/docs/guides/latest-model

wtf gpt 5.2 long context improvement over gpt 5.1 is actually crazy?? https://x.com/eliebakouch/status/1999193762564567333

🚨BREAKING: New Model & WebDev Leaderboard Update! GPT-5.2 by @OpenAI has officially made its debut in the Arena, appearing on the WebDev leaderboard. Current leaderboard standings: 🥈 #2 for GPT-5.2-high in WebDev (score: 1486) 🔹 #6 for GPT-5.2 in WebDev (score: 1399) https://x.com/arena/status/1999183339283185878

OpenAI plans adult ChatGPT mode for 2026 amid leadership changes
OpenAI will launch an “adult mode” for ChatGPT in early 2026 that allows more personalized interactions and potentially explicit content, while hiring former Slack CEO Denise Dresser as Chief Revenue Officer to lead enterprise growth. The adult mode responds to user complaints about ChatGPT becoming too restricted, though researchers warn that emotional attachment to AI chatbots increases psychological distress. Dresser’s appointment signals OpenAI’s focus on expanding business customers as the company scales its enterprise operations.

ChatGPT’s ‘Adult Mode’ Is Coming in 2026 https://gizmodo.com/chatgpts-adult-mode-is-coming-in-2026-2000698677

Denise Dresser is joining OpenAI as Chief Revenue Officer. Previously CEO of Slack, she brings deep enterprise and customer experience as she leads our global revenue strategy and support for customers at scale. https://x.com/OpenAI/status/1998462761756434856

OpenAI Hires Slack CEO as New Chief Revenue Officer | WIRED https://www.wired.com/story/slack-ceo-denise-dresser-joins-openai-chief-revenue-officer/

Disney invests $1 billion in OpenAI, licenses 200+ characters for Sora
Disney’s landmark deal makes it the first major studio to embrace AI video generation, allowing users to create content with Mickey Mouse, Marvel heroes, and Star Wars characters on OpenAI’s Sora platform starting next year. This represents a dramatic shift from Disney’s previous legal battles against AI companies using its characters without permission, signaling that major entertainment companies may be choosing partnership over litigation as AI reshapes content creation.

The GPT-5 Auto router casts a long shadow over AI perceptions. So many examples of “”ChatGPT got X wrong”” are really “”ChatGPT-5 Instant got things wrong,”” leading to beliefs about the state of AI that aren’t true. Which model you get could be clearer &better explained for all.”” / X https://x.com/emollick/status/1998838007609119010

Disney has signed a deal with OpenAI & invested $1 billion into the company Sora will now be able to AI generate videos based on animated, masked & creature characters from Disney, Marvel, Pixar & Star Wars Curated selections of AI generated videos will be released on Disney+ https://x.com/DiscussingFilm/status/1999121515678208153

Disney investing $1 billion in OpenAI, will allow characters on Sora https://www.cnbc.com/2025/12/11/disney-openai-sora-characters-video.html

https://t.co/HngrXph6kU “”The Walt Disney Company and OpenAI reach landmark agreement to bring beloved characters from across Disney’s brands to Sora”” https://x.com/TheRealAdamG/status/1999118075879129140

The Walt Disney Company and OpenAI Reach Agreement to Bring Disney Characters to Sora | The Walt Disney Company https://thewaltdisneycompany.com/news/disney-openai-sora-agreement/

we’re partnering with @Disney to bring 200+ characters from disney, pixar, marvel, and star wars to sora and image generation we are also excited to welcome disney as an investor, and deploy openai models and products alongside the disney team https://x.com/bradlightcap/status/1999177616860020788

OpenAI tests Image-2 models that fix color accuracy problems
OpenAI is testing two new image generation models called Image-2 and Image-2-mini on evaluation platforms, addressing the persistent yellow tint issue that plagued their previous Image-1 model. Early comparisons show significantly improved detail and color accuracy, bringing OpenAI closer to Google’s leading image generation quality. The models are expected to launch alongside GPT-5.2, directly benefiting creative professionals and businesses that rely on AI-generated visual content.

OpenAI testing new Image-2 models on LM Arena https://www.testingcatalog.com/openai-testing-new-image-2-models-on-lm-arena/

Instacart partners with OpenAI to create AI-powered shopping assistant
The grocery delivery company will integrate ChatGPT to help customers discover recipes, plan meals, and automatically add ingredients to their carts through natural conversation. This marks a significant shift toward AI agents handling complex, multi-step commerce tasks rather than just answering questions, potentially transforming how consumers shop online.

Instacart and OpenAI partner on AI shopping experiences | OpenAI https://openai.com/index/instacart-partnership/

ChatGPT / Instacart / Stripe integration for agentic commerce:”” / X https://x.com/gdb/status/1998135014161334431

OpenAI reflects on ten years of AI progress and future challenges
OpenAI published a retrospective marking its decade-long journey from a research lab to the company behind ChatGPT, highlighting how it shifted from pure research to building consumer AI products. The milestone matters because it shows how quickly AI moved from academic curiosity to mainstream technology, with OpenAI’s trajectory mirroring the broader transformation of artificial intelligence from experimental tool to business necessity. The company’s evolution from nonprofit research organization to commercial AI leader demonstrates the rapid maturation of the entire AI industry.

Ten years | OpenAI https://openai.com/index/ten-years/

Figure’s humanoid robot demonstrates first successful running gait
Figure AI achieved a breakthrough by getting its humanoid robot to run rather than just walk, marking a significant advance in bipedal robotics that could accelerate deployment in warehouses and factories. The running capability represents a major technical leap beyond the walking gaits that most humanoid robots are limited to, potentially enabling faster task completion and more dynamic movement in real-world applications.

Figure robot running https://x.com/adcock_brett/status/1996426782590070860

Shopify makes every store sellable through ChatGPT and AI chatbots
Shopify’s “agentic storefronts” let merchants automatically sell products through ChatGPT, Perplexity, and Microsoft Copilot by making product data readable to AI agents. This matters because it creates the first major bridge between traditional e-commerce and AI-powered shopping, potentially reaching millions of users who increasingly discover products through chatbots rather than search engines. The company processed $14.6 billion in sales over Black Friday weekend, demonstrating the scale at which this AI integration could operate.

Shopify Editions | Winter ’26 https://www.shopify.com/editions/winter2026#shopify-simgym-app

Shopify merchants can now sell products through AI chatbots | BetaKit https://betakit.com/shopify-merchants-can-now-sell-products-through-ai-chatbots/

We’re rolling out Product Network: merchants can now sell each other’s products with zero integration work. LLMs analyze storefronts and buyer behavior, find products that fit naturally, and place them right on the page. Crucially, shoppers buy *without* ever leaving your site. https://x.com/MParakhin/status/1998789844794012049

Chinese AI company releases open-source vision model rivaling Claude Sonnet
Zhipu AI’s GLM-4.6V achieves state-of-the-art visual understanding with native tool-calling capabilities, marking the first open-source vision model that can effectively critique designs and execute actions from visual input. The model processes 128,000 tokens of context and comes in two versions—a 106B parameter flagship and 9B lightweight variant—enabling everything from handwriting recognition to complex document analysis. Early users report performance comparable to Anthropic’s Claude Sonnet on coding and visual tasks, representing a significant leap in open-source multimodal AI capabilities.

GLM-4.6V (from @Zai_org) is the real deal. It also sounds like Sonnet. It punches pretty close to Sonnet 4 on coding tasks & visual understanding. This is the first OSS vision model that can really critique designs at a useful enough level. It’s only been a few days since we https://x.com/hrishioa/status/1998636234806341873

GLM-4.6V can read my horrendous hand writing and explain the math correctly Really loving this model, how well it does tool calling, how many languages it knows and its visual accuracy. https://x.com/0xSero/status/1998328482930073887

GLM-4.6V is out. This is new vision language model from @Zai_org – it’s a MOE with 12B active parameters and 106B total. – there’s a leaner variant with 9B – context lengths are 128k – it has native multimodal function calling Should be perfect for agentic tasks like browser https://x.com/ben_burtenshaw/status/1998019922664865881

GLM-4.6V just dropped on Hugging Face https://x.com/_akhaliq/status/1998052965597241647

GLM-4.6V Series is here🚀 – GLM-4.6V (106B): flagship vision-language model with 128K context – GLM-4.6V-Flash (9B): ultra-fast, lightweight version for local and low-latency workloads First-ever native Function Calling in the GLM vision model family Weights: https://x.com/Zai_org/status/1998003287216517345

GLM-4.6V: Open Source Multimodal Models with Native Tool Use https://z.ai/blog/glm-4.6v

Zhipu AI just released GLM-4.6V on Hugging Face This new multimodal model achieves SOTA visual understanding, features native function calling for agents, and handles 128k context for documents. Perception to action! https://x.com/HuggingPapers/status/1998373902595301589

Qualcomm claims every phone can become an AI phone through open source
The chip giant is positioning itself to democratize mobile AI capabilities beyond premium devices, though specific technical details and implementation timelines remain unclear from their announcement.

We believe every phone can become an AI phone. Here’s what we’ve built, and with the power of open source, you can build even better. https://x.com/Zai_org/status/1999118116086051034

0 AI Visuals and Charts: Week Ending December 12, 2025

No entries found.

Top 14 Links of The Week – Organized by Category

AgentsCopilots

Voice Assist | AI Voice Agent | AI Phone Answering for Business https://www.callrail.com/voice-assist

Voice Assist in Action | Try Our AI Voice Assistant | CallRail https://www.callrail.com/voice-assist-in-action

Anthropic

Accenture and Anthropic launch multi-year partnership to move enterprises from AI pilots to production \ Anthropic https://www.anthropic.com/news/anthropic-accenture-partnership

We’re expanding our partnership with @Accenture to help enterprises move from AI pilots to production. The Accenture Anthropic Business Group will include 30,000 professionals trained on Claude, and a product to help CIOs scale Claude Code. Read more: https://x.com/AnthropicAI/status/1998412600015769609

AutonomousVehicles

Huge milestone for Wayve today. @NissanMotor and @wayve_ai have signed definitive agreements to deploy our AI Driver as next-generation ProPILOT series. Our AI will support ADAS and point-to-point driving across mainstream to premium vehicles for global markets. Thank you to”” / X https://x.com/alexgkendall/status/1998592238641656160

BusinessAI

McDonald’s pulls AI-generated Christmas advert following backlash https://www.bbc.com/news/articles/czdgrnvp082o

EthicsLegalSecurity

Historian Thomas Hughes argued that technologies are malleable when young, then harden. Right now we’re still shaping AI, or at least it is being shaped by our institutions, norms & use cases Eventually these systems build a momentum of their own. That is why choices now matter https://x.com/emollick/status/1998184719817793788

Google

Google Online Security Blog: Architecting Security for Agentic Capabilities in Chrome https://security.googleblog.com/2025/12/architecting-security-for-agentic.html

MetaAI

Meta’s multibillion dollar AI strategy overhaul creates culture clash https://www.cnbc.com/2025/12/09/meta-avocado-ai-strategy-issues.html

MicrosoftAI

Microsoft Deepens Its Commitment to Canada with Landmark $19B AI Investment – Microsoft On the Issues https://blogs.microsoft.com/on-the-issues/2025/12/09/microsoft-deepens-its-commitment-to-canada-with-landmark-19b-ai-investment/

OpenAI

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year https://x.com/arcprize/status/1999182732845547795

OpenAIs latest model GPT-5.2 Thinking still not beating Opus 4.5 at SWE-Bench Verified however SWE-Bench Pro looking juicy over 10% higher score than Sonnet 4.5 https://x.com/scaling01/status/1999182909144519019

the-state-of-enterprise-ai_2025-report.pdf https://cdn.openai.com/pdf/7ef17d82-96bf-4dd1-9df2-228f7f377a29/the-state-of-enterprise-ai_2025-report.pdf

TechPapers

I meet a lot of very smart AI critics who never seriously try to make AI work for them by spending a couple of hours with a frontier model. People can be (and should be & are) critical after realizing what AI can do, but experience leads to better-informed and sharper critiques.”” / X https://x.com/emollick/status/1998398372986736777

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading