Agents and Copilots: AI News Week Ending 11/21/2025

Agents and Copilots: AI News Week Ending 11/21/2025

November 21, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Cinematic wide shot of an art deco Emerald City control room with multiple distinct robotic figures operating glowing green control panels and levers, moody theatrical lighting with shafts of light, each robot and workstation highlighted with bright segmentation outlines in different colors, Wicked movie aesthetic with brass and emerald tones, the title AGENTS overlaid in large elegant movie-poster typography.

And pretty much vibe browse with voice, completely changing how the browser is meant to feel on your phone. A true personal assistant. https://x.com/AravSrinivas/status/1991567787408650416

Devin, the AI software engineer, just got its 2025 performance review! Over the past eighteen months, thousands of companies have hired Devin, including Goldman Sachs, Citi, Santander, and Nubank. Using real-world examples and metrics from customers, we looked at where Devin https://x.com/cognition/status/1991218551655657748

Sava (@savatrust) is building an Agentic Trust Company to modernize how $6.5T in U.S. trusts are administered. Driven by how painful it was to set up trusts for his family after selling his S12 company, @nimit is back at YC to build a better way. https://x.com/ycombinator/status/1986841262876729373

LlamaAgents is now in open preview – the fastest way to build, serve, and deploy multi-step document agents that combine LlamaCloud’s document extraction and parsing power with Agent Workflows orchestration. 🚀 Get started instantly with pre-built templates for SEC filings, https://x.com/llama_index/status/1990828159835791697

Let’s goooo! We’ve just launched a new Computer Use Agent (CUA) powered by open models, @huggingface smolagents and @E2B for secure computer sandboxing! We’re building something different. Open. Transparent. Yours. Check this out https://x.com/amir_mahla/status/1991166551945355295

Today, we’re launching the Parallel Search API, the most accurate web search for AI agents, built using our proprietary web index and retrieval infrastructure. Traditional search ranks URLs for humans to click. AI search needs something different: the right tokens in their https://x.com/p0/status/1986479181912539471

🌐 Build agents that can interact with any website Check out this video by @DendriteSystems showing how to build an agent that can interact with websites just like a human would! This video demonstrates a workflow that: – Finds competitors on Product Hunt and Hacker News – https://x.com/LangChainAI/status/1855629502690349326

We introduce Olmo 3, a family of state-of-the-art, fully open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long context reasoning, function calling, coding, instruction following, general chat, and knowledge recall.https://www.datocms-assets.com/64837/1763662397-1763646865-olmo_3_technical_report-1.pdf

Introducing Claude Code on the web. You can now delegate coding tasks to Claude without opening your terminal. https://x.com/claudeai/status/1980337282323452300

Group chats in ChatGPT are now rolling out globally. After a successful pilot with early testers, group chats will now be available to all logged-in users on ChatGPT Free, Go, Plus and Pro plans. https://x.com/OpenAI/status/1991556363420594270

A new way to collaborate in ChatGPT – Fidji Simo https://fidjisimo.substack.com/p/a-new-way-to-collaborate-in-chatgpt

I had access to Gemini 3. It is a very good, very fast model. It also demonstrates the change from chatbot to agent. https://x.com/emollick/status/1990827310082330971

Gemini 3 has impressive benchmarks for building agents (Vending-Bench 2, Terminal Bench, and Sierra). So we tested its performance as a research agent using Deep Agents. We found Gemini 3 is very effective at using research tools like file manipulation, planning, and subagent”” / X https://x.com/LangChainAI/status/1991220334578848209

Google Antigravity is an ‘agent-first’ coding tool built for Gemini 3 | The Verge https://www.theverge.com/news/822833/google-antigravity-ide-coding-agent-gemini-3-pro

Agentic coding today: Gemini 3 spent a few minutes correctly diagnosing the issue. Then, across several rounds of a few minutes of work, failed to actually fix it Then, GPT 5.1 codex max was able to work for about 15 minutes and solve the problem but introduced a small bug.”” / X https://x.com/kylebrussell/status/1991247685672923302

Google @Antigravity is a new agentic platform designed to autonomously plan and execute complex software development tasks. – Access Gemini 3 Pro Preview and other models directly. – Distinct Editor and Agent Manager for synchronous and asynchronous workflows. – Browser Subagent https://x.com/_philschmid/status/1990816850792337454

State of the art reasoning right within an autonomous agent. Complex instruction following and advanced coding capabilities from Gemini 3 Pro helps Jules complete more complex tasks in parallel. Available now for Ultra, Pro coming real soon. 2.5 Pro available to everyone. https://x.com/julesagent/status/1991207201487352222

Google Antigravity is our new agentic development platform. It helps developers build faster by collaborating with AI agents that can autonomously operate across the editor, terminal, and browser. It uses Gemini 3 Pro 🧠 to reason about problems, Gemini 2.5 Computer Use 💻 for https://x.com/GoogleDeepMind/status/1990827890435346787

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini https://x.com/sundarpichai/status/1990812770762215649

Gemini 3 Pro takes first place on Stagehands agentic browsing benchmark https://x.com/scaling01/status/1990872758872387939

🚀 Deep Agents: The Weekly Roundup 🚀 We’ve shipped new resources to help you build Deep Agents capable of handling complex, long-running tasks. 1/🥉 Build a Research Agent with Gemini 3 – We tested Gemini 3’s impressive benchmarks in practice using Deep Agents. We found Gemini https://x.com/LangChainAI/status/1991928474404311493

New Gemini 3 reasoning and tool use capabilities are a big step forward for agents 👀 – thinking_level to set reasoning for task requirements – thought signatures for stateful tool use – larger context window for managing drift on complex tasks LangGraph, LangChain, and Deep”” / X https://x.com/LangChainAI/status/1991222443298660722

ollama run gemini-3-pro-preview 🧠 State-of-the-art reasoning 🖼️ Deep multimodal understanding 💻 Powerful vibe coding so you can go from prompt to app in one shot ⭐ Improved agentic capabilities, so it can get things done on your behalf, at your direction Gemini 3 Pro is https://x.com/ollama/status/1990839646876553543

Meet Google Antigravity, your new agentic development platform. An evolution of the IDE, it’s built to help you: – Orchestrate agents operating at a higher, task-oriented level – Run parallel tasks with agents across workspaces – Build anything with Gemini 3 Pro. https://x.com/antigravity/status/1990813606217236828

The @GoogleDeepMind team just dropped Gemini 3, and we at LlamaIndex have day-zero support! We also made a little demo to show how you can leverage the advanced agentic capabilities and structured output accuracy of Gemini 3 to automate your GitHub workflow around PRs, you just https://x.com/llama_index/status/1990902918388855185

Hot off the presses is Gemini 3 Pro, Google’s new SOTA model – tops LMArena with a score of 1501 points. Launching simultaneously as an API, inside the consumer Gemini app, Google Search, oh and a new agentic IDE. Here’s the TL;DR: 1. Gemini 3 Pro: new SOTA for multimodality https://x.com/bilawalsidhu/status/1990812584019439988

Gemini 3 Pro sets new record on SWE-bench verified: 74%! (evaluated with minimal agent) Costs are 1.6x of GPT-5, but still cheaper than Sonnet 4.5. Gemini iterates longer than everyone; run your agent with a step limit of >100 for max performance. Details & full agent logs in 🧵 https://x.com/KLieret/status/1991164693839270372

Introducing Gemini 3, the best model in the world for multimodal understanding and our most powerful agentic and vibe-coding model yet. Gemini 3 Pro tops the LMArena Leaderboard at 1501 Elo! Available now in Gemini Enterprise & Vertex AI → https://x.com/GoogleCloudTech/status/1990813342189887831

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵 https://x.com/GoogleDeepMind/status/1990812966074376261

On an apples:apples harness comparison (mini-swe-agent: bash-only, same prompts), Gemini 3 Pro sets a new sota on SWE-bench verified: 74.20! https://x.com/ankesh_anand/status/1991199945798365384

Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task https://x.com/arcprize/status/1990820655411909018

#Gemini3 is finally out! Congrats to everyone on this amazing launch! Also very excited to see how #DeepThink can power Gemini3 to further the state-of-the-art performances across reasoning, deep knowledge, and multimodality: 41% HLE, 93.8% GPQA, & 45.1% on ARC-AGI-2 (big jump)! https://x.com/lmthang/status/1990816762300960954

Introducing Gemini 3 — our most intelligent model that helps you bring any idea to life. Gemini 3 is our next step on the path toward AGI and has: 🧠 State-of-the-art reasoning 🖼️ Deep multimodal understanding 💻 Powerful vibe coding so you can go from prompt to app in one shot https://x.com/Google/status/1990813116045602942

Gemini 3 scores 31.1% on ARC-AGI-2. Impressive progress.”” / X https://x.com/fchollet/status/1990813908483928178

Google to enable research automation on Gemini Enterprise https://www.testingcatalog.com/google-to-enable-research-automation-on-gemini-enterprise/

Gemini 3: Introducing the latest Gemini AI model from Google https://blog.google/products/gemini/gemini-3/#responsible-development

Gemini 3 Pro is the new leader in AI. Google has the leading language model for the first time, with Gemini 3 Pro debuting +3 points above GPT-5.1 in our Artificial Analysis Intelligence Index @GoogleDeepMind gave us pre-release access to Gemini 3 Pro Preview. The model https://x.com/ArtificialAnlys/status/1990813106478715098

My Gemini 3 Review — matt shumer https://shumer.dev/gemini3review

Gemini 3 Pro is rolling out to @code developers! https://x.com/pierceboggan/status/1990817374799528259

This is Gemini 3 ⚡ https://x.com/Google/status/1991196250499133809

Gemini 3 Pro has around ~7.5T params (vibe-mathing with explanation) > the naive fit with with an R^2 of 0.8816 yields a mean estimation of 2.325 Quadrillion parameters > ummm, that’s not it > let’s only take sparse MoE reasoning models > this includes gpt-oss-20B and 120B, https://x.com/scaling01/status/1990967279282987068

Google Has Your Data. Gemini Barely Uses It. | Shlok Khemani https://www.shloked.com/writing/gemini-memory

After lots of testing, Gemini 3 is a mixed bag – but a useful addition. Compared to 2.5 Pro, its: 1. Worse at transcription and diarization: Adds words that weren’t said, projects emotion, almost like it’s too smart. 2. Not as good at translation or writing: Baseline”” / X https://x.com/hrishioa/status/1991691037035884754

Google AI Studio https://aistudio.google.com/apps/bundled/info_genius?show=&showPreview=true&showAssistant=true

One of the most striking thing about gemini-3-pro is how much better it is with several iterations. It makes better use of the information from the previous iterations than other models. After one iteration is is barely better than gpt-5.1, while after 5 it is almost 10pp ahead. https://x.com/htihle/status/1991137526480810470

BREAKING: Gemini 3 Pro is out!! It’s state of the art (or close) on coding, reasoning, computer use and more. It’s also extremely fast. We’ve been testing it internally @every for a few hours. Here’s what we’ve noticed so far: – Coding. It rips in @FactoryAI’s Droid. So fast https://x.com/danshipper/status/1990812588511567898

📍GeoGuessr isn’t just a game; it’s a massive test of complicated visual reasoning and world knowledge. Very satisfied to see my efforts helped Gemini pass this test and beat human pros for the first time! Still a long way to go, but a dream milestone just unlocked 🔓”” / X https://x.com/songyoupeng/status/1991214812316201131

Gemini 3 Pro is the first LLM to beat professional human players at GeoGuessr https://x.com/scaling01/status/1990904842488066518

We wrote a Gemini 3 Developer Guide including all new API features, Migration strategies, and technical details for building with Gemini 3 Pro preview: – Control reasoning via `thinking_level` low and high modes. – per part `media_resolution` for better multimodal reasoning – https://x.com/_philschmid/status/1990836465647984969

Gemini 3 Pro Preview now on aistudio Pricing: <=200K tokens • Input: $2.00 / Output: $12.00 > 200K tokens • Input: $4.00 / Output: $18.00″” / X https://x.com/scaling01/status/1990797742629925073

The model also shows increased resistance to prompt injections and improved protection against cyberattacks. As we continue to advance AI, we are relentlessly focused on ensuring this transformative technology benefits humanity while minimizing potential harms. See our Gemini 3″” / X https://x.com/GoogleDeepMind/status/1991118579119304990

Gemini 3 is here, and it’s built to be our most secure model yet. 🔒 ✅The most comprehensive safety evaluations of any Google AI model to date ✅Rigorous testing against our Frontier Safety Framework ✅Independent assessment by external industry experts https://x.com/GoogleDeepMind/status/1991118575554408556

From Google’s Frontier Safety Report on Gemini 3 Pro: – clear improvements on all CBRN benchmarks especially in LabBench, a benchmark designed that measures performance on practical tasks required for scientific research in biology – on the hardest subset of their https://x.com/scaling01/status/1991177438789857661

The secret behind Gemini 3? Simple: Improving pre-training & post-training 🤯 Pre-training: Contra the popular belief that scaling is over–which we discussed in our NeurIPS ’25 talk with @ilyasut and @quocleix–the team delivered a drastic jump. The delta between 2.5 and 3.0 is https://x.com/OriolVinyalsML/status/1990854455802343680

One great thing about AntiGravity IDE is its agentic Chrome integration It doesn’t just build the frontend, it drives the UI, pokes the controls and then auto tests fixes in the same loop Cursor has this as well but not nearly as smooth. Playwright MCP is too slow in”” / X https://x.com/cto_junior/status/1990965505243689094

🚨BREAKING: @GoogleDeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev – surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains https://x.com/arena/status/1990813759938703570

This is the biggest performance delta we’ve seen since launching Design Arena Gemini 3.0 Pro has taken #1 overall and #1 in 4 of our 5 code arenas – Website, Game Dev, 3D Design, and UI Components Well-earned congratulations to the @GoogleDeepMind team on a remarkable https://x.com/grx_xce/status/1990815340893245481

Students in the US (and many other countries) can get their hands on all the Gemini 3 Pro goodness for free!”” / X https://x.com/demishassabis/status/1990993251247997381

Gemini 3 Pro just took the #1 spot in our new AA-Omniscience Index — but it is a nuanced story AA-Omniscience is our new knowledge and hallucination eval. Gemini 3 Pro’s leadership is driven by its high Accuracy (percentage correct); the model scored a massive 14 points higher https://x.com/ArtificialAnlys/status/1990926803087892506

The Scaling Wall Was A Mirage | Tomasz Tunguz https://tomtunguz.com/gemini-3-proves-pretraining-scaling-laws-intact/

Gemini 3.0 is the next-generation frontier model on our LiveCodeBench Pro benchmark, better than GPT-5/5.1. We’re very excited that Google has adopted our benchmark: a continuously updated collection of problems from Codeforces, ICPC, and IOI designed specifically to minimize https://x.com/wenhaocha1/status/1990818535640088585

Just how significant is the jump with Gemini 3? We just released a new leaderboard to track AI developments. Gemini 3 is the largest leap in a long time. https://x.com/hendrycks/status/1991188096302338491

I played with Gemini 3 yesterday via early access. Few thoughts – First I usually urge caution with public benchmarks because imo they can be quite possible to game. It comes down to discipline and self-restraint of the team (who is meanwhile strongly incentivized otherwise) to”” / X https://x.com/karpathy/status/1990854771058913347

Gemini 3 Pro is live in Cline! 1M token context window & a new SOTA on benchmarks. https://x.com/cline/status/1990820473555595389

Gemini 3 Pro is now available in Windsurf”” / X https://x.com/cognition/status/1990856307616985163

Gemini 3 Pro set a new record on FrontierMath: 38% on Tiers 1-3 and 19% on Tier 4. On the Epoch Capabilities Index (ECI), which combines multiple benchmarks, Gemini 3 Pro scored 154, up from GPT-5.1’s previous high score of 151. https://x.com/EpochAIResearch/status/1991945942174761050

This is cope. Gemini 3’s behavioral problems are structurally the same as in Gemini 2.5 and earlier. To the extent that it does better, it’s just overcoming its biases with raw horsepower. Gemini post-training is malign since the very first experiments. Cursed bloodline. https://x.com/teortaxesTex/status/1991086733962715540

Introducing Gemini 3 Pro, the world’s most intelligent model that can help you being anything to life. It is state of the art across most benchmarks, but really comes to life across our products (AI Studio, the Gemini API, Gemini App, etc) 🤯 https://x.com/OfficialLoganK/status/1990813077172822143

Gemini 3 is now in AI Mode — making it even easier to ask anything in Search. Here’s more on this update from @rmstein, VP of Product for Search.”” / X https://x.com/Google/status/1991212868620951747

Gemini 3 Pro takes the crown on Scale AI’s VisualToolBench https://x.com/scaling01/status/1991932333147213834

Gemini 3 Pro is the best multimodal model ever. You can now turn a single picture into an almost pixel-perfect website. It’s honestly incredible. And it’s now the default model in @MagicPathAI https://x.com/skirano/status/1991175569388494972

Crushing superiority of Gemini-3-pro-“”””””preview”””””” on WeirdML. The gap between 5.1(high) and Gemini is equal to one between o1(high) and o3(high). A generation’s worth of advantage. https://x.com/teortaxesTex/status/1991156784719888588

Google just released Gemini 3, its most intelligent AI model yet. I caught up with Demis Hassabis, CEO of Google DeepMind, to ask about: -Gemini 3 and Google’s AI strategy -Google’s new Antigravity tool -Medical-grade AI Here’s what he said: https://x.com/rowancheung/status/1990814463428059597

Gemini 3 Pro is now available in Cursor!”” / X https://x.com/cursor_ai/status/1990814174264381910

Gemini 3 Pro with the largest delta recorded thus far on @Designarena 🤯 https://x.com/OfficialLoganK/status/1990826955730489733

One of the early Gemini 3 tests I did was take the bouncing ball example and try to make it 10x harder, Gemini 3 Pro crush it in 1 shot… (not best of N, literally first prompt made this) https://x.com/OfficialLoganK/status/1990819310072443340

Gemini 3 Pro is now available on OpenRouter https://x.com/scaling01/status/1990817957497155848

Amp’s new default model: Gemini 3 Pro https://x.com/thorstenball/status/1990821112750481744

Hey, Gemini 3, So I need DOOM, but more root vegetables, also no guns or demons or mars. And more of a focus on different flooring styles. but otherwise EXACTLY the same as DOOM.”” Gemini: “”Here is F.L.O.O.R. (First-person Lino Observation & Ornamental Review).”” Pretty good! https://x.com/emollick/status/1991249261816594896

Gemini is such a weird model – I find it too jagged and unreliable at instruction following to switch to it but where it’s i guess been RL’ed it seems to be big leaps..”” / X https://x.com/Teknium/status/1991815251084628196

We’ve been intensely cooking Gemini 3 for a while now, and we’re so excited and proud to share the results with you all. Of course it tops the leaderboards, including @arena, HLE, GPQA etc, but beyond the benchmarks it’s been by far my favourite model to use for its style and https://x.com/demishassabis/status/1990818891392496005

Gemini 3 Pro has taken the #1 spot on Dubesor Bench https://x.com/scaling01/status/1991931844347207887

Congrats to Google on Gemini 3! Looks like a great model.”” / X https://x.com/sama/status/1990828659981144462

Look what we have been cooking for you #Gemini3 ! ✨ Beyond other capabilities, Gemini 3’s spatial understanding and world knowledge are also truly next-level! Incredible to see the progress, and proud to have helped chart some of those new territories!!🚀”” / X https://x.com/songyoupeng/status/1990835604767322523

Ohh no, this is so much worse Maybe the antigravity IDE doesn’t have really good style guidelines for Gemini 3 Should give it a try in cursor https://x.com/cto_junior/status/1990966750746484920

Gemini 3 Pro is still undefeated on the Snake Arena https://x.com/scaling01/status/1991932651968852333

gemini 3 pro • our most intelligent model yet • SOTA reasoning • 1501 Elo on LMArena • next-level vibe coding capabilities • complex multimodal understanding available now in Google AI Studio and the Gemini API https://x.com/GoogleAIStudio/status/1990813281414455385

Gemini 3 Pro on the new Vending-Bench Arena 🤯 tool calling is impressive with this model. https://x.com/OfficialLoganK/status/1990833534672797703

At Box, we’ve been testing Gemini 3 Pro in early access with Box AI on our most complex advanced reasoning eval, and Gemini Pro was a massive 22 percentage point improvement over Gemini 2.5 Pro. For this test, we ask the model a series of complex, real-world questions with a set https://x.com/levie/status/1990820579981840746

And say hello to Gemini 3 Deep Think, even more SOTA compared to Gemini 3 Pro 🤯 https://x.com/OfficialLoganK/status/1990814722250146277

Gemini 3 Pro #1 on PMPP-Eval PMPP = Programming Massively Parallel Processors aka coding with CUDA”” / X https://x.com/scaling01/status/1990920793887273396

Gemini 3 is now available in the @GeminiApp. ⚡ Starting today, you’ll be able to: 🧠 Get more helpful, concise responses with easier-to-read formatting. 🧪 Try our new experiments, visual layout and dynamic view, that use Gemini 3 capabilities to make your responses more visual https://x.com/Google/status/1990829896562548855

Gemini 3 Prompting: Best Practices for General Usage https://www.philschmid.de/gemini-3-prompt-practices

Gemini 3: Introducing the latest Gemini AI model from Google https://blog.google/products/gemini/gemini-3/

Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ https://x.com/ChaseBrowe32432/status/1990810992931135909

Google Antigravity Blog: introducing-google-antigravity https://antigravity.google/blog/introducing-google-antigravity

nano banana pro (gemini 3 pro image) • SOTA text rendering & localization • granular physics & lighting control • up to 4k studio-quality output • precise character consistency now available in preview on the Gemini API and in Google AI Studio with paid API key https://x.com/GoogleAIStudio/status/1991537543989588445

Nano Banana Pro is taking off. Here are some standout examples from the community so far 🧵”” / X https://x.com/GeminiApp/status/1991570302720163988

Introducing Nano Banana Pro (Gemini 3 Pro Image), Google DeepMind’s most advanced image generation and editing model. Now available on Together AI for production-scale visual content creation with reliable inference. https://x.com/togethercompute/status/1991614379394203973

If you see an image and want to confirm it has been made with Google AI, upload it to the Gemini app and ask a question like “”Was this generated with Google AI?”” Gemini will check for the SynthID watermark and use its own reasoning to return a response that helps you quickly make”” / X https://x.com/Google/status/1991552945754612118

The Gemini app gets new image verification features https://blog.google/technology/ai/ai-image-verification-gemini-app/

We’re launching generative UI features in the @GeminiApp and Google Search, starting with AI Mode, to make information more accessible in new ways. Here’s what to know: – Our generative UI dynamically creates visual layouts and interactive interfaces — such as webpages, games,”” / X https://x.com/Google/status/1991270067934216372

Gemini 3 Pro Preview has comparable speeds to Gemini 2.5 Pro, with 128 output tokens per second. This places it ahead of other frontier models including GPT-5.1 (high), Kimi K2 Thinking and Grok 4 https://x.com/ArtificialAnlys/status/1990813128226189811

OpenAI can’t beat Google in consumer AI – by John Hwang https://nextword.substack.com/p/openai-cant-beat-google-in-consumer

If Google really wanted to accelerate science, it should make Deep Research (and Gemini in general) have better retrieval from Google Scholar and Google Books. These are unique repositories that contain a remarkable amount of the world’s academic knowledge in hard-to-access form.”” / X https://x.com/emollick/status/1989755741549597039

Try this: have Nano Banana Pro search for you online, then ask it to create what your Instagram profile would look like. It’s a surprisingly good way to visualize your online persona. https://x.com/skirano/status/1991921872330735982

Senior Director of Product Management for Gemini @tulseedoshi breaks down the latest on Gemini 3 and Nano Banana Pro ⬇️ https://x.com/Google/status/1991652494032732443

Fun little Gemini 3 experiment where I asked it “”build me a time machine simulator, make it very very good”” and then “”make it better”” a few times. I like that it added calls to Gemini within the application, including adding speech & nano banana images. https://x.com/emollick/status/1990904243239473351

btw if you want extra precision in your editing with Nano Banana Pro and you are an Ultra subscriber, you can use it in the Flow app https://x.com/demishassabis/status/1991662935983419424

Nano Banana PRO is live in LTX. We took it for a test drive, and the results are wild. You’re going to want to save this one… Here’s what’s new 🧵 https://x.com/LTXStudio/status/1991943188379250933

🍌⚡ We put Gemini 2.5 Flash Image “Nano Banana” vs. Gemini 3 Pro Image “Nano Banana Pro” head-to-head… Same prompt. Two different outcomes. Here’s what @GoogleDeepMind shared is new: 🔶 Crisp, clearer text 🔶 4K-ready visuals 🔶 Stronger Gemini 3 reasoning 🔶 Adjustable https://x.com/arena/status/1991652781879620088

Over the past ~8 hours, @yupp_ai users from around the world have been going 🍌🍌for the new Google Nano Banana Pro model – it sits atop our Image leaderboard by a wide margin! Congrats @sundarpichai and @Google for building on Gemini 3.0 to produce the world’s best image model! https://x.com/lintool/status/1991693200822768033

From Gemini 3 to Nano Banana Pro & more, the team has been shipping. Here’s a look at the latest Drops 🧵1/10″” / X https://x.com/GeminiApp/status/1991953958257205641

Google to release Nano Banana Pro next week https://www.testingcatalog.com/google-to-release-nano-banana-pro-powered-by-gemini-3-pro-next-week/

Nano Banana Pro image generation in Gemini: Prompt tips https://blog.google/products/gemini/prompting-tips-nano-banana-pro/

Gemini 3 Pro Image (Nano Banana Pro) – Google DeepMind https://deepmind.google/models/gemini-image/pro/

Nano Banana Pro is wild. I just built a little app in Google AI Studio to help build intuition around AI papers. Paper reading is more fun than ever. 🙂 Images generated by Nano Banana Pro. Gemini 3 + Nano Banana Pro is an insane combo. https://x.com/omarsar0/status/1991657126188773878

Developers can build with Nano Banana Pro (Gemini 3 Pro Image) https://blog.google/technology/developers/gemini-3-pro-image-developers/

These major improvements in accuracy of rendered text are part of why the Nano Banana Pro model is such an upgrade over our earlier Nano Banana model (e.g. error rate goes from 56% for Nano Banana, aka Gemini 2.5 Flash Image, to 8% for Nano Banana Pro, aka Gemini 3 Pro Image).”” / X https://x.com/JeffDean/status/1991573065994744091

Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model https://simonwillison.net/2025/Nov/20/nano-banana-pro/

🚨🍌BREAKING: @GoogleDeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena! Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards. Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for https://x.com/arena/status/1991540746114199960

Gemini 3 Pro Image vs GPT-Image 1 https://x.com/scaling01/status/1991546597013160290

Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind https://blog.google/technology/ai/nano-banana-pro/

Real world users on @yupp_ai prefer Google Nano Banana Pro 🍌🍌an incredible 80+% of the time when compared to competitor models for everyday use cases! https://x.com/lintool/status/1991693562820587926

Starting today for Google AI Ultra subscribers, creating with Nano Banana Pro in Flow means mastering the elements and the lens with precision and control. Watch @sanchitsawaria break down how to transform a single static frame into a cinematic shot: ✅ Change focus to guide the https://x.com/FlowbyGoogle/status/1991620311637283138

Nano Banana Pro (Gemini 3 Pro Image) now available in @GoogleAIStudio and Gemini API 🍌🍌🍌 The model “thinks”” through a prompt and can retrieve real-time data, such as weather forecasts or stock charts, using Google Search grounding before generating high-fidelity images https://x.com/_philschmid/status/1991537712420020225

Nano Banana Pro is great at making paper illustrations Here is Attention is All You Need https://x.com/osanseviero/status/1991804629554995247

Nano Banana Pro marks a significant jump in accuracy of rendered text within images across many languages. https://x.com/19kaushiks/status/1991535638676664399

A powerful way to use Nano Banana Pro in @FlowbyGoogle Step 1: Upload an image or generate an image using Imagen or Nano Banana https://x.com/nmatares/status/1991696375403409765

Nano Banana Pro🍌is a bigger milestone than it seems. Watch how it can generate high-fidelity annotated figures and equations from papers. And you can iterate on images using chat! 🤯 Watch until the end. If enough interest, I will try to release the app over the weekend. https://x.com/omarsar0/status/1991911424868970662

Nano Banana Pro, released this morning, is clearly the best image generation model. Superb instruction following, plus it can generate full infographics (with correct spelling and properly rendered text!) from a short prompt based on running extra searches https://x.com/simonw/status/1991545654901133797

Gemini 3 is coming to Google Search, starting with AI Mode. ⚡ Here’s what to know: 🏆 This marks the first time we’ve brought a Gemini model to Search on day one. 🔎 Our newest model brings incredible reasoning power to Search because it’s built to grasp unprecedented depth https://x.com/Google/status/1990845314551447838

The most crushing defeat for OpenAI I did not expect Gemini 3 Pro to be SOTA on WeirdML WeirdML has been an OpenAI stronghold for quite some time. https://x.com/scaling01/status/1991154001283358992

Gemini 3 Pro takes the crown on LisanBench – it scores 2.2x higher than GPT-5 while using 2.4x fewer reasoning tokens – it has the highest score on 23 out of 50 words – Grok-4 is the only model that can keep up https://x.com/scaling01/status/1990845163652993166

The Artificial Analysis leaderboard shows Gemini 3 at 73%, GPT-5.1 at 70%, and Kimi at 67% – minor differences. On our leaderboard, Gemini is 47%, GPT-5.1 is 38%, and Kimi is 27% – Gemini 3 is substantially more capable on hard benchmarks. https://x.com/hendrycks/status/1991188104804208736

Gemini Robotics 1.5 features a separate reasoning engine (ER), but its VLA model is also capable of thinking due to interleaved reasoning tokens. The VLA is able to independently operate long autonomous sequences (15+ minutes) without aid from the ER/VLM. https://x.com/TheHumanoidHub/status/1989393094631199088

In honor of today’s 🍌🍌 launch we decided to release not one but TWO new outputs: Starting with Infographics! Create customizable, high-quality, visual summaries of your sources. Information never looked so good. Rolling out to Pro users now and free users in the coming weeks! https://x.com/NotebookLM/status/1991574926046687683?s=20

Introducing Manus Browser Operator https://manus.im/blog/manus-browser-operator

Manus AI launches Browser Operator extension https://www.testingcatalog.com/manus-ai-launches-browser-operator-extension/

We estimate that Kimi K2 Thinking has a 50%-time-horizon of around 54 minutes (95% confidence interval of 25 to 100 minutes) on our agentic SWE tasks. Note that we conducted this evaluation through a third-party inference provider, which reduces our confidence in this estimate. https://x.com/METR_Evals/status/1991658241932292537

Kimi K2 Thinking is impressive. So I built a multi-agent deep researcher, Kimi Deep Researcher. It generates long research reports on any topic, powered by subagents (web searcher, analyzer, and synthesizer). It can do 100s of tool calls per session. Repo soon! https://x.com/omarsar0/status/1988974710592516454

🤗 Kimi-k2-Thinking has reached top performance on the latest IMO-level reasoning benchmark, AMO-Bench from Meituan Longcat!”” / X https://x.com/Kimi_Moonshot/status/1991139250566545886

Kimi-K2 Thinking gets the same score on METR as Claude 3.7 Sonnet as I was saying, open-source is 9 months behind frontier labs on agentic, long-context reasoning tasks it’s still an improvement and open-source models seem to be on their own exponential, but I heavily suspect https://x.com/scaling01/status/1991665386513748172

Build a coding agent with GPT 5.1 https://cookbook.openai.com/examples/build_a_coding_agent_with_gpt-5.1

gpt-5.1-codex is genuinely cracked – the strongest agentic coding model available right now. what’s becoming clear is the increasing importance of the model + the harness + the tools.”” / X https://x.com/shyamalanadkat/status/1989184364727632348

OpenAI just published an ace 28-page guide on context engineering for AI agents. Instead of throwing more memory at LLMs, it shows how to engineer context: when to trim, summarize, prevent drift, and defend against context poisoning. 100% free. Link to the guide in 🧵↓ https://x.com/DataChaz/status/1988581390452249022

GPT-5.1 Pro is rolling out today to all Pro users. It delivers clearer, more capable answers for complex work, with strong gains in writing help, data science, and business tasks.”” / X https://x.com/OpenAI/status/1991266192905179613?s=20

Today we at @OpenAI are releasing GPT-5.1-Codex-Max, which can work autonomously for more than a day over millions of tokens. Pretraining hasn’t hit a wall, and neither has test-time compute. Congrats to my teammates @kevinleestone & @mikegmalek for helping to make it possible! https://x.com/polynoamial/status/1991212955250327768

Building more with GPT-5.1-Codex-Max | OpenAI
https://openai.com/index/gpt-5-1-codex-max/

New Codex model is a significant improvement!”” / X https://x.com/sama/status/1991258606168338444

GPT-5.1 (High) coming in on par with GPT-5 Pro on ARC-AGI but nearly an OOM cheaper https://x.com/GregKamradt/status/1990501297095909486

GPT-5.1-Thinking-high finally beats Grok-4 on ARC-AGI-2 https://x.com/scaling01/status/1990506507125895444

GPT-5.1-Codex-Max beats GPT-5.1 by 8% on OpenAI interal Pull-Requests https://x.com/scaling01/status/1991219951932489738

My GPT-5.1 Pro Review — matt shumer https://shumer.dev/gpt51proreview

GPT-5.1-Codex-Max is out (API coming soon)! • Outperforms GPT-5.1-Codex and more efficient • Natively trained with compaction to handle long-running tasks • New “”Extra High”” reasoning effort for your hardest problems $ npm install -g @openai/codex@latest https://x.com/dkundel/status/1991224903031210453

GPT-5.1-Codex-Max is new SOTA on METR https://x.com/scaling01/status/1991220418535936302

GPT-5.1-Codex was released six days ago, now we have GPT-5.1-Codex-Max. (The use of every naming scheme piled on top of each other, from version numbers to qualifiers like Max, makes it hard to see how big a deal each release is, but this looks like a big jump in ability)”” / X https://x.com/emollick/status/1991220527550157282

GPT-5.1-Codex-Max shows big improvements in CTF https://x.com/scaling01/status/1991218908833939818

New model is out in Codex. Gets to same quality of solution faster and raises the ceiling for how complex of a tasks are achievable. $ codex -m gpt-5.1-codex-max Best experienced in the latest CLI version 0.59, which also packs a lot of other fixes and improvements. https://x.com/thsottiaux/status/1991210545253609875

Introducing group chats in ChatGPT | OpenAI https://openai.com/index/group-chats-in-chatgpt/

Building more with GPT-5.1-Codex-Max | OpenAI https://openai.com/index/gpt-5-1-codex-max/

GPT-5.1-Codex-Max improves over GPT-5.1s Paperbench score (replicate state-of-the-art AI research) https://x.com/scaling01/status/1991219458426433729

GPT-5.1-Codex-Max shows progress on MLE-bench https://x.com/scaling01/status/1991219683450843145

GPT-5.1 is now available in the API. Pricing is the same as GPT-5. We are also releasing gpt-5.1-codex and gpt-5.1-codex-mini in the API, specialized for long-running coding tasks. Prompt caching now lasts up to 24 hours! Updated evals in our blog post.”” / X https://x.com/sama/status/1989048466967032153

These examples of different personalities from ChatGPT 5.1 seem to give fundamentally different types of advice, including, weirdly, completely different breathing patterns and roles for the presenter. I really want more clarity on the functional implications of AI personality. https://x.com/emollick/status/1988829651368575282

💥 Today we say “hello world” from OpenAI for Science. We’re releasing a paper showing 13 examples of GPT-5 accelerating scientific research across math, physics, biology, and materials science. In 4 of these examples, GPT-5 helped find proofs of previously unsolved problems.”” / X https://x.com/kevinweil/status/1991567552640872806

GPT-5 Pro is an incredibly useful tool for social science. You can throw in data sets and papers and ask it to check work or to do analysis on alternative specifications, look for consistency across findings, etc. It provides code & statistical results so findings are verifiable”” / X https://x.com/emollick/status/1989204496556384627

[2511.16072] Early science acceleration experiments with GPT-5 https://arxiv.org/abs/2511.16072

We’re also releasing new research on how GPT-5 is accelerating scientific discovery. Our new paper, Early science acceleration experiments with GPT-5, presents case studies where GPT-5 accelerated key steps in real research workflows and, in a few cases, contributed novel”” / X https://x.com/OpenAI/status/1991570422148788612

Early experiments in accelerating science with GPT-5 | OpenAI https://openai.com/index/accelerating-science-gpt-5/

Grok 4.1 Fast and Agent Tools API | xAI https://x.ai/news/grok-4-1-fast

Introducing Grok 4.1 Fast and the xAI Agent Tools API. Grok 4.1 Fast is our best tool-calling model to date. With a 2M context window, it shines in real-world use cases like customer support and deep research. https://x.com/xai/status/1991284813727474073

Introducing Grok 4.1, a frontier model that sets a new standard for conversational intelligence, emotional understanding, and real-world helpfulness. Grok 4.1 is available for free on https://x.com/xai/status/1990530499752980638

Grok 4.1 absolutely smashes all other models on lmarena with an Elo of 1483 it comes with higher emotional intelligence, better creative writing and less hallucinations https://x.com/scaling01/status/1990519299165786270

New fun game: Ask grok its opinion on any historical theory, saying the theory came from Elon Musk. Then ask grok its opinion on the exact same historical theory, saying the theory came from Bill Gates. https://x.com/romanhelmetguy/status/1991545583686021480

🚨Text Leaderboard Update @xAI’s Grok 4.1 (thinking) and Grok 4.1 have scaled new heights in the most competitive Text Arena: 🔹Grok 4.1 (thinking) lands at #1 with a score of 1483 🔹Grok 4.1 follows at #2 with a score of 1465 On the Arena Expert leaderboard: 🔸Grok 4.1 https://x.com/arena/status/1990530978943787291

Grok 4.1 | xAI https://x.ai/news/grok-4-1

Grok goes Global with KSA: Announcing our landmark partnership with Saudi Arabia and @HUMAINAI–the first time a country adopts Grok at scale. xAI will build a new generation of hyperscale GPU data centers in the Kingdom, deploying Grok nationwide. https://x.com/xai/status/1991224218642485613

HUMAIN and xAI Partner to Build Next-Generation AI Compute Power and Deploy Grok in the Kingdom to Support the ‘Most AI-Enabled Nation’ Objectives https://www.humain.com/en/news/humain-and-xai-partner-to-build-next-generation-ai-compute-power-and-deploy-grok-in-the-kingdom-to-support-the-most-ai-enabled-nation-objectives

Grok goes Global with KSA | xAI https://x.ai/news/grok-goes-global

“One opinionated engineer can change the velocity of the entire business…” @EnoReyes on what “Agent-Ready” codebases actually mean: First of all: it’s still VERY early but that’s exactly the time to invest in it. And if you invest in validation stack – that becomes your moat. https://x.com/TheTuringPost/status/1991953335683842326

Most AI agents today work better alone. Not because collaboration doesn’t matter. It’s because training multi-agent systems is fundamentally harder. Single-agent RL optimizes individual performance. But complex research requires coordination like information exchange and https://x.com/dair_ai/status/1991242085928943895

Scaling Agent Learning via Experience Synthesis 📝: https://x.com/jaseweston/status/1986613046047846569

Deep Agents is a fantastic example of how adding just a few middlewares (subagents, file system, summarization) can take your agents to the next level. What middlewares do you want to see next in LangChain?”” / X https://x.com/sydneyrunkle/status/1990786810436559168

Training-Free Group Relative Policy Optimization by @TencentCloudADP Youtu-Agent Team: Paper: https://x.com/TheTuringPost/status/1990373305053069410

Ever watched your AI agent get stuck in an endless back-and-forth? 🤔 Answering the same question 3 different ways? Or chatting forever because the user went off the rails? 🙈 Yeah… that’s a sign your agent needs to know when to stop. 🙅 In my new video I show how to add https://x.com/bromann/status/1990819790294745573

AI agents are amazing… until they start acting like unhinged interns🤪 Without guardrails, they can overshoot rate limits 📈, blow through API quotas 💸, or trigger endless tool call loops 🔁. @LangChainAI’s Tool Call Limit Middleware gives you precise control: per run, per https://x.com/bromann/status/1991544566563189022

The real power of AI agents is collaboration. Box CTO @BenAtBox joins @LangChainAI CEO @hwchase17 to discuss what the future looks like when AI systems can work more like teammates than just assistants. Timestamps: 00:52 – Harrison’s path to starting LangChain 01:54 – The https://x.com/Box/status/1991582582920839354

Day-4 of Kaggle AI Agents. We just released 50 page guide covering Agent Quality, Evaluation and Observability. 100% free. https://x.com/Saboo_Shubham_/status/1988810820428648501

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. – They branch endlessly. – They run ten experiments at once. – They need isolation, context, memory, structured https://x.com/_avichawla/status/1991031261427872028

Discover how Taskade’s AI agents are transforming project management! Say goodbye to tedious tasks and hello to Automated Task Management. Get ready for a future of fully streamlined workflows! 🚀 https://x.com/Taskade/status/1856484091023278574

Agent Labs: Welcome to GPT Wrapper Summer – by swyx (Shawn) https://www.latent.space/p/agent-labs

🎨 Introducing Prompt Canvas — a novel UX for prompt engineering Building LLM applications requires new and dedicated tools for prompt engineering. With Prompt Canvas in LangSmith, you can: • Collaborate with an AI agent to draft, refine, and edit your prompts • Define custom https://x.com/LangChainAI/status/1856386593457848746

Unlocking the Power of Multi-Agent LLM for Reasoning Designing and optimizing multi-agent systems is important. This paper analyzes multi‑agent systems where one meta‑thinking agent plans and another reasoning agent executes, and identifies a lazy agent failure mode. They find https://x.com/omarsar0/status/1986831275144138756

Using Agent Communication Protocol for Multi-Agent Orchestration. When autonomous agents collaborate, how they talk matters more than what they talk about. That’s where communication protocols come in — they define the rules of interaction. Here’s a quick tour of the most used https://x.com/rohit4verse/status/1988635229339435056

🎬 The @LangChainAI middleware series continues! Build reliability into your agentic applications with LangChain’s new model fallback middleware. Select any number of backup models (even across providers) to support consistent service for your users in the face of unpredictable”” / X https://x.com/sydneyrunkle/status/1990828290244751634

It’s pretty clear that AI agents can inhale documents and perform knowledge work. It’s also pretty clear humans need to be able to observe this process to prevent things from going off the rails. One of the biggest benefits towards orchestrating document workflows through code https://x.com/jerryjliu0/status/1991196434843222145

Some pretty eye-opening data on the effect of AI coding. When Cursor added agentic coding in 2024, adopters produced 39% more code merges, with no sign of a decrease in quality (revert rates were the same, bugs dropped) and no sign that the scope of the work shrank. Big impact. https://x.com/emollick/status/1988837620248412406

End-to-end RL for agents may be more sample-efficient than previously assumed. I’ve been following developments in LLM agent training closely, and this new research paper introduces an important idea. Agent-R1 is an end-to-end reinforcement learning framework specifically https://x.com/omarsar0/status/1991190120016540054

Agentic Document Workflows are crucial for AI-driven knowledge work and automation, but they are often treated as black boxes, which leads to silent failures and unexpected behaviors. With our Agent Workflows you don’t have to worry about not knowing what is happening behind the https://x.com/llama_index/status/1991183958164553959

Excited to announce Sakana AI’s Series B! 🐟 From day one, Sakana AI has done things differently. Our research has always focused on developing efficient AI technology sustainably, driven by the belief that resource constraints-not limitless compute-are key to true innovation. https://x.com/hardmaru/status/1990204623471395284

We have too many benchmarks on model ability, and too few on agentic work. Increasingly, what matters economically is not the ability of AIs to get a question right through an API call, but rather its ability to combine tools & ability to solve a problem. That is under-measured.”” / X https://x.com/emollick/status/1990076061254668403

Today we’re releasing Deep Research Tulu (DR Tulu)–the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚 https://x.com/allen_ai/status/1990803193014395004

“We are not trying to build AGI, we’re trying to build very good coding models. Thank god. Great presentation by @leerob from @cursor_ai and their early reflections on the new Composer. What stood out for me: • Training happens inside a production-grade replica of Cursor’s https://x.com/TheTuringPost/status/1991888391508496758

In this afternoon’s talk at @aiDotEngineer’s AI Engineer CODE Summit / AIE LEAD (2:25 PM ET), Lei Zhang, our Head of Technology Infrastructure, discusses the complex & nuanced reality of deploying #AI at scale in a mature software engineering organization https://x.com/TechAtBloomberg/status/1991563444374389018

Together Instant Clusters are built for human AI researchers and engineers — and the autonomous agents they create. https://x.com/togethercompute/status/1990858090217156693

Cursor 2.1 is now available! Plans now have an interactive UI for answering clarifying questions. Also new: in-editor code reviews, instant grep, and improved browser use. https://x.com/cursor_ai/status/1991967045542646059

Parallel Web Systems on X: “Today, we’re launching Parallel Extract, a new API in our Agent Tools bundle. When given a URL, Extract fetches all content from that page and returns it in markdown, either in full detail or in a compressed form for better token efficiency. https://t.co/j5S062m6f3 https://t.co/5HQm4NGfKi” / X
https://x.com/p0/status/1991568991954034727

Staying in control as AI becomes part of your coding workflow is essential. 🧵 We just released features that enhance security and transparency in @code: https://x.com/code/status/1991549116149592330

MiroThinker Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling https://x.com/_akhaliq/status/1990870542379843726

Introducing claude-agent-server – run Claude Agent (the harness behind Claude Code) in a cloud sandbox and control with it via websocket. Claude Agent is actually a great harness for a general agent, not just coding. BUT it’s hard to integrate because it’s meant to run locally.”” / X https://x.com/dzhng/status/1991154972558581889

The new Claude Code front-end design Skill is a big improvement, ends the purple gradient and Arial font tyranny and actually considers audience. I have been having it go over some of my vibe coded apps and redoing the design to make them match the use. Lots of value in Skills.”” / X https://x.com/emollick/status/1989093191182926249

Integrating 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚 Systems via 𝗠𝗖𝗣 👇 If you are building RAG systems and packing many data sources for retrieval, most likely there is some agency present at least at the data source selection for retrieval stage. This is how MCP enriches the evolution of https://x.com/Aurimas_Gr/status/1986793469822529640

Get a Free Custom Voice Agent | Enterprise-Grade AI for Calls https://www.bland.ai/customagent

Voice Assist in Action | Try Our AI Voice Assistant | CallRail https://www.callrail.com/voice-assist-in-action

Our team can often be observed talking into Cline. But it’s cumbersome when the speech model you’re using doesn’t speak the language of a developer. Cline’s voice mode now runs on Avalon from @aquavoice_, a speech model tuned for how engineers actually talk: “checkout dev”, https://x.com/cline/status/1990527639816421860

We are announcing cline-bench, a real world open source benchmark for agentic coding. cline-bench is built from real world engineering tasks from participating developers where frontier models failed and humans had to step in. Each accepted task becomes a fully reproducible https://x.com/pashmerepat/status/1991596028735184899

Introducing cline-bench: A Real-World, Open Source Benchmark for Agentic Coding – Cline Blog https://cline.bot/blog/cline-bench-initiative

This AI agent boosted user satisfaction by 70% and handles thousands of messages daily. Here’s how @bookingcom built it with Weaviate and GPT-4. The agent handles tens of thousands of guest messages daily, using Weaviate as the vector database to retrieve relevant response https://x.com/weaviate_io/status/1991884601392779564

cline-bench is a collaborative benchmark. The best tasks will come from developers working on challenging engineering problems in open source repos. There are two ways to contribute: 1. opting into cline-bench and using the Cline Provider on open source repos (these difficult”” / X https://x.com/cline/status/1991673421957365837

Today we are announcing cline-bench, an open source benchmark of real world agentic coding tasks that we are building together with the community. cline-bench turns difficult Cline tasks from open source repos into containerized RL environments, with real repo snapshots, real”” / X https://x.com/cline/status/1991612268220752130

Aide is an open-source AI native code editor built on top of the agentic framework. It’s SOTA at 43% on swebench-lite and has all the features you expect out of Cursor/Copilot, with complete data privacy and plug-and-play LLM integration. https://x.com/ycombinator/status/1854237314651980257

One of the reasons why AI agents combined with SaaS is such a big deal is that software now brings along the work with it. Most software tools go underutilized relative to what they’re capable of merely because the customer doesn’t have the resources necessary to fully leverage”” / X https://x.com/levie/status/1988788206821863438

Notion Product Tour https://demo.hellonotion.com/psl/hn5y0t02?g=cmhxtu7dc005c04jt10z1f96x&s=0

We built this AI Agent at Varick. If you want to free up hundreds of employee hours with a bookkeeping agent that: – generates invoices / monitors inboxes for incoming invoices – propagates that information autonomously to bill . com / basware / NetSuite / etc – classifies”” / X https://x.com/vasuman/status/1988770670113878183

Build a document understanding agent for SEC filings that uses a multi-step approach with LlamaClassify and Extract to identify the filing type and hand it off to the right extraction agent. Deployed with LlamaAgents. 🔧 Customize extraction schemas to fit your specific data https://x.com/llama_index/status/1988696219015848401

Enterprises can now bring their own key in @GitHubCopilot! Unlike client BYOK available in @code, once configured: 1. every user in your organization has access to the model without having to self-configure, 2. it works across all Copilot surfaces https://x.com/pierceboggan/status/1991612120312770600

Input tokens: 1048576 Output tokens: 65536 “”Our most intelligent model with SOTA reasoning and multimodal understanding, and powerful agentic and vibe coding capabilities”””” / X https://x.com/scaling01/status/1990803527887626446

Cursor and Claude Code just got their first serious competition with this new IDE. it gives you beautiful visual controls over the code with zero annoying abstractions. this is my honest review and what I really liked about it: → it’s built both for non-technicals and pro devs https://x.com/Hesamation/status/1988747662787576234

Here’s the usage of their MCP server. It’s created based on 35 years of Postgres knowledge, and full access Postgres docs, all in a format that agents can easily process. You can try this live in Tiger Data’s Free Tier here: https://x.com/_avichawla/status/1991031330604458344

we’re excited about our continued collaboration with @cline on pushing real-world coding performance on their harness. cline-bench is a great example of how open, real-world benchmarks can move the whole ecosystem forward. high-quality, verified coding tasks grounded in actual https://x.com/shyamalanadkat/status/1991603916115775932

Tired of evaluating LLMs on made-up problems that look nothing like real tasks? Introducing EDIT-Bench, a code editing benchmark built from in-the-wild user interactions in VSCode. Real-world edits are challenging: 𝗼𝗻𝗹𝘆 𝟭/𝟰𝟬 𝗺𝗼𝗱𝗲𝗹𝘀 𝘀𝗰𝗼𝗿𝗲 > 𝟲𝟬% 𝗽𝗮𝘀𝘀@𝟭. https://x.com/iamwaynechi/status/1991211138902536326

“”Most coding benchmarks still look like leetcode style puzzles. A typical task is something like “”write me a server that generates fibonacci sequences from scratch,”” in a blank file, with no repo, no history, and no real engineering constraints. cline-bench starts from the”” / X https://x.com/cline/status/1991930365821456526

Sakana AI takes crown as Japan’s most valuable unicorn – Nikkei Asia https://asia.nikkei.com/business/technology/artificial-intelligence/sakana-ai-takes-crown-as-japan-s-most-valuable-unicorn

Google released another whitepaper 🔥. it’s a masterclass on building intelligent optimized sessions and long-term memories for agents that actually work. It covers: > context engineering best practices > comparing memory and RAG > memory-as-a-tool pattern > agent-to-agent (A2A) https://x.com/Hesamation/status/1988750893957730396

Gemini 3 and GPT 5.1 are now live in the W&B Weave Playground. You can now test these new models side-by-side, refine your prompts, and evaluate their performance against your actual production traces. Weave helps you experiment and build better AI agents faster. Try it below! https://x.com/weave_wb/status/1991601539728003200

Are you seriously telling me Gemini 3 chat does not have search capabilities?? Is this even a Google product?”” / X https://x.com/Teknium/status/1991059260193792204

Cline 3.38.0 is out now. This release brings Gemini 3 Pro Preview support, @aquavoice_ Avalon as the new model for speech to text and a series of bug fixes in context truncation and native tool calling. Here is what’s new: https://x.com/cline/status/1991215206413017252

Gemini 3 Pro is live across Vercel AI Cloud, and it’s available: • on AI Gateway using 𝚐𝚘𝚘𝚐𝚕𝚎/𝚐𝚎𝚖𝚒𝚗𝚒-𝟹-𝚙𝚛𝚘-𝚙𝚛𝚎𝚟𝚒𝚎𝚠 • as a model on https://x.com/vercel/status/1990816243138633917

Noooooo Gemini, what is this unholy fix https://x.com/cto_junior/status/1990988738298839278

Announcing Design, a new Replit experience focused on beautiful UIs. The first non-slop AI design experience, powered by Gemini 3.0 https://x.com/amasad/status/1990859423942893816?s=20

Try Nano Banana Pro NOW on Together AI: https://x.com/togethercompute/status/1991954662606635391

PRO for PROs Nano Banana PRO is available at no cost for @huggingface PRO subscribers on Spaces, go bananas 🍌 https://x.com/multimodalart/status/1991549140627775511

Don’t underestimate the importance of a good harness that fits the model. In terminal-bench2, GPT-5.1-Codex goes from 16th place (36%) using Terminus 2 to 1st place (57%) using Codex CLI. Gemini 3 Pro enters at #2 with Terminus 2. https://x.com/tristanzajonc/status/1990879703935103256

It’s truly in a league of its own, this was one shotted https://x.com/cto_junior/status/1991564259516702997

Sakana AI、シリーズBラウンドの資金調達を発表 https://x.com/SakanaAILabs/status/1990212217216880829

Sakana AI raises $135M Series B at a $2.65B valuation to continue building AI models for Japan https://x.com/TechCrunch/status/1990388003525787710

Sakana AI raises $135M Series B at a $2.65B valuation to continue building AI models for Japan | TechCrunch https://techcrunch.com/2025/11/17/sakana-ai-raises-135m-series-b-at-a-2-65b-valuation-to-continue-building-ai-models-for-japan/

Microsoft 🤝 AG-UI Protocol @Microsoft Agent Framework is now AG-UI compatible! 🎉 Connect your MS Agent Framework .NET agents into fullstack applications with Python coming soon 👀 MS Agent Framework takes care of the agentic backend, AG-UI connects your agents to the https://x.com/CopilotKit/status/1988274720358494501

Build distributed AI agent networks with type-safe async messaging with autogen from @Microsoft 33.6K stars 🌟 in Github What it offers: → Multi-language support (Python, .NET) with typed interfaces and enforced type checking → Modular architecture with three API layers: https://x.com/rohanpaul_ai/status/1856409023798812953

Microsoft Agent 365: The control plane for AI agents | Microsoft 365 Blog https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/

Claude is now available in public preview in Microsoft Foundry. @Azure customers can now use Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 to build applications and enterprise agents, and to power Claude Code. https://x.com/claudeai/status/1990798185259020501

Proud that @MicrosoftEdge is the first AI browser in the enterprise. – agentic actions – daily briefing – YouTube summarization – multi-tab analysis – search your history with natural language Bringing together the best of AI and enterprise-grade security. https://x.com/mustafasuleyman/status/1990812248886161733

Claude now available in Microsoft Foundry and Microsoft 365 Copilot \ Anthropic https://www.anthropic.com/news/claude-in-microsoft-foundry

Perplexity Pro and Max subscribers now have access to Kimi-K2 Thinking and Gemini 3 Pro. https://x.com/perplexity_ai/status/1991614227950498236

🚨Leaderboard Update New model provider in the Arena: @DeepCogito has released Cogito v2.1 (MIT licensed) 🔹Top 10 Open Source Model for WebDev, rank #10 🔹Tie ranks #18 overall for WebDev This puts Cogito v2.1 on par with community favorites like Qwen 3 Coder Plus & Kimi K2 https://x.com/arena/status/1991211903331496351

Document AI goes beyond traditional OCR to create intelligent systems that read, understand, and act on documents like humans do. Our latest blog post explains how agentic OCR combined with LLM-powered workflows is transforming document automation across industries: 🧠 Agentic https://x.com/llama_index/status/1990465974357950625

OpenAI is shifting the center of gravity from “prompt your agent better” to “train your agent inside your world.” The vibe was “okay, we know prompting isn’t enough anymore, here are the knobs you actually need.” Their tech staff – Will and Cathy- were talking about Agent RFT https://x.com/TheTuringPost/status/1991920970555162956

Science underpins medicine, energy, and national security, yet progress remains slow. Early experiments with university and national-lab partners show GPT-5 helping researchers explore ideas and reach insight faster. Hear directly from OpenAI researchers behind the work: https://x.com/OpenAI/status/1991569987933458814

early-science-acceleration-experiments-with-gpt-5.pdf https://cdn.openai.com/pdf/4a25f921-e4e0-479a-9b38-5367b47e8fd0/early-science-acceleration-experiments-with-gpt-5.pdf

Imagine how unhobbled Codex will be once we rollout a proper fix to this. In the meantime, believe me, there are … technical reasons for why this is. Not that I like them.”” / X https://x.com/thsottiaux/status/1989940347494084683

🍐@trypearai (YC F24) is an open source AI code editor with a curated inventory of the best AI tools, natively integrated for effortless AI-powered coding. They’re building a flexible framework for the AI coding tech stack under a unified UX: https://x.com/ycombinator/status/1856441845880107408

Just watched the Gradio 6 launch livestream, wow this is a groundbreaking update! With the “Super HTML” component, Gradio effectively evolves from product to platform–you can build an entire app with nothing but Super HTML components. Great job @anotheraliabid and @abidlabs! https://x.com/cocktailpeanut/status/1991932424121639066