Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Wide cinematic shot of a sleek orbital command station in deep space surrounded by hundreds of small glowing autonomous drone agents moving in coordinated swarm patterns, each leaving subtle blue data trails, dramatic rim lighting against black space with distant stars, cool blue and muted green color palette, high contrast sci-fi aesthetic inspired by Ender’s Game strategic systems, emphasizing both individual autonomy and collective intelligence at epic scale.
Lovable says it’s nearing 8 million users as the year-old AI coding startup eyes more corporate employees | TechCrunch https://techcrunch.com/2025/11/10/lovable-says-its-nearing-8-million-users-as-the-year-old-ai-coding-startup-eyes-more-corporate-employees/
We’re releasing🍨Gelato-30B-A3B, a state-of-the-art computer grounding model that delivers immediate performance gains for computer-use agents! Trained on our open-source🖱️Click-100k dataset, Gelato achieves 63.8% on ScreenSpot-Pro and 69.1% on OS-World-G. It outperforms https://x.com/anas_awadalla/status/1987913284989985092
one of our engineers left Claude Code to run overnight for a large migration task. in the morning he started asking it questions about what it did and… POOF! it reverted everything https://x.com/imjaredz/status/1988379604160311696
but likely an autocomplete error the date is in the past and there’s a claude-sonnet-4-5-20250929″” / X https://x.com/scaling01/status/1989145863508394059
We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: https://x.com/AnthropicAI/status/1989033795341648052
Full report: Disrupting the first reported AI-orchestrated cyber espionage campaign https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf
Disrupting the first reported AI-orchestrated cyber espionage campaign \ Anthropic https://www.anthropic.com/news/disrupting-AI-espionage
We disrupted a highly sophisticated AI-led espionage campaign. The attack targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We assess with high confidence that the threat actor was a Chinese state-sponsored group.”” / X https://x.com/AnthropicAI/status/1989033793190277618
AI startup Cursor raises $2.3 billion round at $29.3 billion valuation https://www.cnbc.com/2025/11/13/cursor-ai-startup-funding-round-valuation.html
ByteDance’s Volcano Engine debuts coding agent at $1.3 promo price https://www.techinasia.com/news/bytedances-volcano-engine-debuts-coding-agent-at-1-3-promo-price
ByteDance unveils China’s most affordable AI coding agent at just US$1.30 a month | South China Morning Post https://www.scmp.com/tech/big-tech/article/3332365/bytedance-unveils-chinas-most-affordable-ai-coding-agent-just-us130-month
One of the biggest use cases for agentic document automation is insurance underwriting ✍️ Underwriting depends on processing *massive* volumes of unstructured documents, from medical reports, scanned forms, and way more. It’s also historically been a massively manual process. https://x.com/jerryjliu0/status/1988394058197184923
Build a document understanding agent for SEC filings that uses a multi-step approach with LlamaClassify and Extract to identify the filing type and hand it off to the right extraction agent. Deployed with LlamaAgents. 🔧 Customize extraction schemas to fit your specific data https://x.com/llama_index/status/1988696219015848401
Chart OCR just got a major upgrade with our new experimental “”agentic chart parsing”” feature in LlamaParse 📈🧪 Most LLMs struggle with converting charts to precise numerical data, so we’ve created an experimental a system that follows contours in line charts and extracts https://x.com/llama_index/status/1989060127551549854
We just crossed 2.1 million users vibe coding in AI Studio with hundreds of thousands of apps made every day 🤯 This is just the start but we have been blown away by the reception so far. Keep the feedback coming!”” / X https://x.com/OfficialLoganK/status/1986467546355183985
Super excited to announce SIMA 2! It’s a general agent that can understand & reason about complex instructions and complete tasks in simulated game worlds, even ones it has never seen before. Incredible to see how it can learn just from self-play… a crucial step towards AGI https://x.com/demishassabis/status/1989096784870928721
SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds – Google DeepMind https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
SIMA 2 is our most capable AI agent for virtual 3D worlds. 👾🌐 Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images. Here’s how 🧵 https://x.com/GoogleDeepMind/status/1988986218722291877
Our SIMA 2 research offers a strong path towards applications in robotics and another step towards AGI in the real world. Find out more → https://x.com/GoogleDeepMind/status/1988987865401798898
SIMA 2 🤝 Genie 3 We tested SIMA 2’s abilities in simulated 3D worlds created by our world model Genie 3. It demonstrated unprecedented adaptability by navigating its surroundings and took meaningful steps toward goals. https://x.com/GoogleDeepMind/status/1989024090414309622
SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds – Google DeepMind
https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/?utm_source=x&utm_medium=social&utm_campaign=&utm_content=
With Agents it’s the first time we need secure auth for user data where the user might not want to “”visually”” involved in the flow, no “”login with…””, no clicks, no browser. OAuth wasn’t built for this (it’s redirect-heavy and assumes human eyes), static API keys are way too”” / X https://x.com/_philschmid/status/1987889931822236059
.@satyanadella gave me and @dylan522p an exclusive tour of Fairwater 2, the most powerful AI datacenter in the world. We then chatted through Satya’s vision for Microsoft in a world with AGI. 0:00:00 – Fairwater 2 0:04:15 – Business models for AGI 0:13:42 – Copilot 0:20:56 – https://x.com/dwarkesh_sp/status/1988656226989699138
GPT-5, Claude, Kimi, and Gemini: “”I can travel back in time to any time before 1500 and change only one thing, what is the single thing you would change, nothing obvious.”” https://x.com/emollick/status/1987355374928769395
Comet Assistant puts you in control https://www.perplexity.ai/hub/blog/comet-assistant-puts-you-in-control
Our View on Agentic AI: AI Assistants That Work For You, In Your Favorite Apps | Adobe Blog https://blog.adobe.com/en/publish/2025/10/28/our-view-agentic-ai-assistants-that-work-you-in-your-favorite-apps
NotebookLM adds Deep Research, Docx, Sheets and more https://blog.google/technology/google-labs/notebooklm-deep-research-file-types/
Build a coding agent with GPT 5.1 https://cookbook.openai.com/examples/build_a_coding_agent_with_gpt-5.1
GPT-5.1, GPT-5.1-Codex, and GPT-5.1-Codex-Mini, the full suite of OpenAI’s latest 5.1-series models, are now rolling out in public preview in GitHub Copilot. Try it out in @code. https://x.com/github/status/1989044218451394968
Get the most out of GPT-5.1, based on internal testing and early testing with customers ⬇️ https://x.com/OpenAIDevs/status/1989378869976326570
You can now interrupt long-running queries and add new context without restarting or losing progress. This is especially useful for refining deep research or GPT-5 Pro queries as the model will adjust its response with your new requirements. Just hit update in the sidebar and https://x.com/OpenAI/status/1986194298971590988
We received early access to GPT-5.1 for testing Pace agents are now 50% faster while matching or even exceeding accuracy across our evals! https://x.com/paceagent/status/1989043013356486762
@OpenAIDevs is cooking again: GPT-5.1 performed very well in our first tests! Photo of an espresso machine → “”make it a stylized 2.5D version”” → done faster than you can pull a good shot of espresso. Handles ambiguous prompts fast and effectively and the responses also feel https://x.com/jetbrains/status/1989049485335429143
This is a really useful addition for Deep Research, but somewhat challenging to use in practice for GPT-5 Pro, since you need to be very good at interpreting its thinking process which can be opaque & which GPT-5 Pro has a tendency not to show after a certain point in any case”” / X https://x.com/emollick/status/1986323210288165332
Excited to get @OpenAI’s GPT-5.1 Instant live for our customers and their agents. Nice early results: a 20% improvement in low-latency tool-calling performance compared to GPT-5 (minimal).”” / X https://x.com/SierraPlatform/status/1989085128434593816
GPT-5.1 in ChatGPT is rolling out to all users this week. It’s smarter, more reliable, and a lot more conversational. https://x.com/OpenAI/status/1988714373058351213
GPT-5.1 is now available in the API. It’s faster, more steerable, better at coding, and ships with practical new tools. If you’re building apps or agents where intelligence, speed, and cost matter, GPT-5.1 should feel like a meaningful upgrade. https://x.com/OpenAIDevs/status/1989042617750024403
Introducing GPT-5.1 for developers | OpenAI https://openai.com/index/gpt-5-1-for-developers/
GPT-5.1 Prompting Guide https://cookbook.openai.com/examples/gpt-5/gpt-5-1_prompting_guide
OpenAI readies ChatGPT Group Chats with custom controls https://www.testingcatalog.com/openai-readies-chatgpt-group-chats-with-custom-controls/
Build a coding agent with GPT 5.1
https://cookbook.openai.com/examples/build_a_coding_agent_with_gpt-5.1
“I finally reached human-level performance (85%) on ARC-AGI v1 for under $10k and within 12 hours. I use the same multi-agent collaboration with evolutionary test-time compute, now powered by GPT-5 pro with lower parallelism. https://x.com/jerber888/status/1987982067116777521
@OpenAI’s GPT-5.1 delivers a solid upgrade from GPT-5 for agentic coding. We’ve noticed that the model is more steerable, overthinks less, and is better at frontend design. The model is also faster on most tasks because it dynamically adjusts reasoning depth based on the”” / X https://x.com/cognition/status/1989081722353529178
We’ve been testing Box AI with GPT-5.1 for the past week to compare it to GPT-5 for enterprise content use-cases. It’s a very strong upgrade from GPT-5. It’s super fast, performing ~2X (or more) faster on our tests on long documents (30,000+ tokens); and we saw an 8 percentage point gain in data extraction from our most our most challenging documents (across 1,000+ data fields) from a variety of content types. https://x.com/levie/status/1989051715207983511
GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9×9 puzzle, we’ve been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved–approximately 2x the previous leader–and is the first https://x.com/SakanaAILabs/status/1988080410392404021
GPT-5.1: A smarter, more conversational ChatGPT | OpenAI https://openai.com/index/gpt-5-1/
GPT-5.1 isn’t “GPT-5 but faster.” In our evals of the model, we found it’s the highest-precision model we’ve ever tested for code-related tasks like code review. Less noise, more fixes, reviews that read like patches again. https://x.com/coderabbitai/status/1989035006774354387
GPT-5.1 is a great new model that we think people are going to like more than 5. But with 800M+ people using ChatGPT, one default personality won’t work for everyone. We launched new preset personalities so people can make ChatGPT their own. https://x.com/fidjissimo/status/1988683216681889887
Moving beyond one-size-fits-all – Fidji Simo https://fidjisimo.substack.com/p/moving-beyond-one-size-fits-all
Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini | VentureBeat https://venturebeat.com/ai/baidu-just-dropped-an-open-source-multimodal-ai-that-it-claims-beats-gpt-5
agent memory works best when you combine vector search, full-text search, and graph traversal but who wants to maintain a multi-tenant graph database? not me so I built a system to create unlimited on-demand graph databases with @cloudflaredev durable objects https://x.com/boristane/status/1983211987677954384
Great overview from prof. @RLanceMartin on context engineering and how principles like offload, reduce, and isolate show up in modern agent architectures – including our new Deep Agents package + CLI. https://x.com/jakebroekhuizen/status/1989130283866812437
Christoph Meyer and Lars Heling from @SAP, described why agents fail inside complex enterprise systems. “AI agents might struggle in complex systems for two reasons — choosing the correct API to execute, and understanding the business process context.” Lars emphasized that “APIs https://x.com/DeepLearningAI/status/1989397092570104010
Ever wish your AI agent would ask before acting? 🤔 I built a Human-in-the-Loop middleware for @LangChainAI that pauses execution until you approve the next step. 🎥 Watch the full demo → https://x.com/bromann/status/1988653017982226704
Observability for GenAI, Agentic AI, and LLM Workloads https://www.dynatrace.com/info/reports/bring-clarity-to-your-ai-systems/
One of the really fun parts of building AI agents is iterating to figure out when something should be handled by the main agent, should be a deterministic tool call, or be a subagent. * Main agent – great for keeping context and state of the overall workflow, but eventually you”” / X https://x.com/levie/status/1985566097152885126
We’ve raised $2.3B in Series D funding from Accel, Andreessen Horowitz, Coatue, Thrive, Nvidia, and Google. We’re also happy to share that Cursor has grown to over $1B in annualized revenue and now produces more code than any other agent in the world. This funding will allow”” / X https://x.com/cursor_ai/status/1988971258449682608
Create multi-agent systems where agents dynamically hand off control https://x.com/tom_doerr/status/1986159002242146667
Free PDF on Agent Protocol Landscape: https://x.com/_avichawla/status/1989228348971893236
Engineers became the bottleneck when analyzing sensor data. MOVEdot (@movedot_) built AI agents that fix that, working hand in hand to analyze telemetry, video & documentation 100x faster. Starting with race cars, expanding to manufacturing, robotics, and more. https://x.com/ycombinator/status/1986433394675241359
there’s a new concept I’m seeing emerging in AI Agents (especially coding agents), which I’ll call “”harness engineering”” – applying context engineering principles to how you use an existing agent Context engineering -> how context (long or short, agentic or not) is passed to an https://x.com/dexhorthy/status/1985699548153467120
Agentic AI: Single vs Multi-Agent Systems 1. Introduction → Agentic AI focuses on building autonomous systems that can perceive, reason, plan, and act independently or collaboratively. → The architecture can involve a single intelligent agent or multiple interacting agents, https://x.com/e_opore/status/1985555189458174323
Why we built LangSmith for improving agent quality As more agents move into production, teams need to move beyond vibe-checking and bring rigor to how they understand agent behavior at scale. In this video, the LangSmith engineering team and Harrison (@hwchase17) sit down to https://x.com/LangChainAI/status/1985740553150218298
🎬 Day 2 of our LangChain middleware series! Today: Tool Call Limit Middleware 🚫Unrestricted tool calling leads to ineffective agents and costly API bills. Watch me use this new middleware to rein in a shopping agent on a spending spree! https://x.com/sydneyrunkle/status/1988667837381242973
Thrilled that our paper, “”Multi-Agent Evolve””, was selected as a top AI paper of the week! 📄 Paper: https://x.com/youjiaxuan/status/1985400839221961040
Today, we’re open-sourcing Agentex, the agentic infrastructure layer in the Scale GenAI Platform. Built for developers everywhere, Agentex gives the community transparency and control to help shape what the future of agent infrastructure looks like. https://x.com/scale_AI/status/1988653903504896478
Here’s What’s Next in Agentic Coding – Seconds_0 Substack https://seconds0.substack.com/p/heres-whats-next-in-agentic-coding
RL Environments and the Hierarchy of Agentic Capabilities https://surgehq.ai/blog/rl-envs-real-world
We need more papers like this one which examines how AI agents & humans work together Current agents were fast, but not strong enough to do tasks on their own & approached problems from too much of a programing mindset. But combining human & AI resulted in gains in performance https://x.com/emollick/status/1987151826613833984
Are you wasting half of your time writing database queries? Here’s how search agents are changing that: The Query Agent is an autonomous system that sits on top of your @weaviate_io Cloud data. You ask questions in natural language, and it handles everything in between: • https://x.com/helloiamleonie/status/1989007852502139221
Excited to see FlowAgent by @TearlineAI live — it’s built on LangChain and LangGraph, and helps folks orchestrate complex Web3 tasks.”” / X https://x.com/LangChainAI/status/1988012398176071728
Together AI 🤝 @CollinearAI Introducing TraitMix, Collinear’s simulation product empowering teams to generate persona-driven AI agent interactions. 🔌Plug these interactions into your workflows and evaluate their effectiveness with Together Evals. Details: https://x.com/togethercompute/status/1988374675093897380
Semantic search improves our agent’s accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code. https://x.com/cursor_ai/status/1986124270548709620
Agents forget everything after each task! Graphiti builds a temporal knowledge graph for Agents that provides a memory layer to all interactions. Fully open-source with 20k+ stars! Learn how to use Graphiti MCP to connect all AI apps via a common memory layer (100% local): https://x.com/_avichawla/status/1986160137363660961
one notable problem about this dream of “”continuous learning”” models / “”self evolving”” agents is that the business model of centralized ai right now is quite antithetical to this try telling your GTM people you can’t hype customers about a new version of the product because the”” / X https://x.com/swyx/status/1988370167622234524
💻Sandboxes for DeepAgents We’re excited to launch Sandboxes for DeepAgents, a new set of integrations that allow you to safely execute arbitrary DeepAgent code and bash commands in remote sandboxes. Supports @RunloopAI @daytonaio @modal Your DeepAgent runs locally (or https://x.com/LangChainAI/status/1989006586388574397
GEPA featured in @OpenAI and @BainandCompany new cookbook tutorial, showing how to build self-evolving agents that move beyond static prompts. See how GEPA dynamically enables agents to autonomously reflect, learn from feedback, and evolve their own instructions. https://x.com/LakshyAAAgrawal/status/1988008687156556200
Agents 2.0: From Shallow Loops to Deep Agents https://x.com/bibryam/status/1985328607544111224
Introduction to Agents | Kaggle https://www.kaggle.com/whitepaper-introduction-to-agents
Your workspace can now go live 🌱 Publish apps that think, learn, and act in real time. Dashboards, portals, and tools powered by your Projects, Agents, and Automations. Set access, add a password, and more. Built with @Taskade Genesis. Your workspace, alive. https://x.com/Taskade/status/1986214457895494007
Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers | VentureBeat https://venturebeat.com/ai/terminal-bench-2-0-launches-alongside-harbor-a-new-framework-for-testing
The Technium: Emotional Agents https://kk.org/thetechnium/emotional-agents/
agentic_protocols_landscape.pdf https://copilotkit-public-assets.s3.us-east-1.amazonaws.com/socials/agentic_protocols_landscape.pdf
ok lots of the most cracked ai ux people at @sync_conf today but i’m going to be thinking about this demo for a long time heres @_adamwiggins_ , @threepointone , me et al grilling Steve about his new collaborative multiagents on Tldraw!! https://x.com/swyx/status/1988771055582998834
Andrew Ng kicked off AI Dev 25 x NYC by explaining why AI continues to accelerate: coding is getting faster, teams can prototype far more quickly, and the real bottleneck is now gathering user feedback. He closed by encouraging attendees to connect, collaborate, and build https://x.com/DeepLearningAI/status/1989400305356697856
H Blog – Holo2 https://www.hcompany.ai/blog/holo2
Would love to see your https://x.com/nbashaw/status/1985375346901004521
Cline has added support for Hermes 4 70b & 405b in their VS Code extension, JetBrains, and CLI – available now wherever you use @cline! → https://x.com/NousResearch/status/1989427241424654534
Past, Present, and Future · Cursor https://cursor.com/blog/series-d
🔥 Anthropic just solved AI’s biggest bottleneck Every agent today burns tokens like fuel every tool call, every definition, every intermediate result jammed into context. Now Anthropic’s introducing the fix: code execution with MCP. Instead of calling tools directly, agents https://x.com/godofprompt/status/1986340782694113505
We’ve added the ability to use plugins in the Claude Agent SDK. This allows you to bring the same extension points from Claude Code like subagents & skills into your agent. For example, this is how you can use our document skills to create docx, ppt and xlsx files https://x.com/trq212/status/1985456238713512440
Claude Code Agents running in Docker containers Here’s how it works under the hood 👇 Full breakdown of running Claude Code Agents in Docker – from container setup to execution and file extraction. https://x.com/dani_avila7/status/1985724708579381273
Claude code is good Codex is good Cursor is good Windsurf is good Cline is good Roo Code is good Kilo is good Amp is good OpenCode is good Aider is good”” / X https://x.com/theo/status/1988380210715389958
Fun example of the moving frontier, I have all of these little programs I created back with GPT-4 to solve small problems I had. They worked, but very in-elegantly. I asked Claude Code to go through all of those bits, organize them, improve them, and update them – worked well!”” / X https://x.com/emollick/status/1987590146980565092
I keep coming back to GDPval, there is a lot in that paper that sheds light on the coming impact of AI on knowledge work, especially as agentic work starts to become a real thing, replacing the back-and-forth cyborg/centaur prompting we have used for years https://x.com/emollick/status/1988088613125714402
The Next Stage of AI Coding Evaluation Is Here https://news.lmarena.ai/code-arena/
As AIs get smarter & more useful, our benchmarks become less useful. Measuring general knowledge or coding ability gives us only a glimpse into what an AI model can do. Anyone who wants to use AI seriously for real work will need to assess it themselves. https://x.com/emollick/status/1988440050716279110
Giga (@gigaml) is building the next generation of customer support — real-time AI agents that can understand emotion, resolve issues instantly, and scale across the world’s largest enterprises. The team recently raised $61M to power emotionally intelligent, human-quality https://x.com/ycombinator/status/1986453397260644713
⭐️New LangChain Academy Course: LangSmith Essentials⭐️ Testing applications is essential to the development lifecycle, but LLM systems are non-deterministic – you can’t always predict how they will behave. Add multi-turn interactions and tool-calling agents, and testing agents https://x.com/LangChainAI/status/1989025161488793743
More ways to build and scale AI agents with Vertex AI Agent Builder | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/more-ways-to-build-and-scale-ai-agents-with-vertex-ai-agent-builder?e=48754805
Build automated AI workflows with Gemini and n8n. Deploy n8n on Google Cloud Run to connect and automate different services and apps, creating a simple AI agent with Gemini 2.5. https://x.com/googleaidevs/status/1985406623003639898
Most models: think → tool call → think → tool call K2 Thinking: keeps tool calls inside the reasoning trace so multi-step workflows don’t drift. We’ll show how Moonshot post-trained for agentic tool calling and demo complex workflows running in one model call.”” / X https://x.com/togethercompute/status/1988009780149878904
It turns out that Kimi K2 Thinking is also a beast at deep research. It can run 200-300 tool requests for impressive multi-agent capabilities. Would you like to see a code example of it?”” / X https://x.com/omarsar0/status/1987912692099682399
Kimi K2 Thinking is impressive. So I built a multi-agent deep researcher, Kimi Deep Researcher. It generates long research reports on any topic, powered by subagents (web searcher, analyzer, and synthesizer). It can do 100s of tool calls per session. Repo soon! https://x.com/omarsar0/status/1988974710592516454
These are pretty impressive benchmarks from a Chinese open weights model. Especially big is the agentic capability, which has generally lagged in the open weights models. Be interesting to see independent confirmation soon, I found K2 a solid, but kind of weird, model to use.”” / X https://x.com/emollick/status/1986452925418270871
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built https://x.com/Kimi_Moonshot/status/1986449512538513505
🚀We’re going live with @Kimi_Moonshot on Nov 19 for a technical deep dive on Kimi K2 Thinking Learn about the 1T parameter MoE that allows your AI agent to make 300 tool calls in one run. Register: https://x.com/togethercompute/status/1988009777247510564
from Kimi AMA: – K3 will likely use KDA or some other hybrid attention mechanism – Kimi-K2 will get vision https://x.com/scaling01/status/1987916859400659011
I wonder if part of what makes Kimi K2 Thinking impressive is that it produces a lot more thinking tokens for even minor & non-technical queries than any model I have used. This is the thinking trace for “”write me a really good sentence about cheese”” it is 1,595 tokens long! https://x.com/emollick/status/1987286609713107261
Try Kimi-K2-Thinking now on Together AI https://x.com/togethercompute/status/1988011880443470217
I’m sorry Kimi bros The problem is and was 100% the OpenRouter API and it’s starting to piss me off that long reasoning always breaks Just use Kimi API for now and not OpenRouter if you have requests that take a lot of reasoning tokens. Simpler requests work fine with”” / X https://x.com/scaling01/status/1987938809628291168
since testing Kimi-K2 Thinking I have become very wary of providers on OpenRouter might switch to original provider APIs only they need to do quality testing for every model and provider”” / X https://x.com/scaling01/status/1988399213563236810
Kimi K2 Thinking passes the Lem Test the first time, very few models have done so Just like Kimi K2, however, this remains a very weird & interesting model in a way that is hard to benchmark. Its writing is often very good but sometimes doesn’t hold up under close investigation https://x.com/emollick/status/1986552301922738651
Thanks everyone for testing Kimi K2 Thinking and sharing benchmark results! We’ve noticed that benchmark outcomes can vary across providers. Some third-party endpoints show substantial accuracy drops (e.g., 20+ pp), which has negatively affected scores on reasoning-heavy tasks”” / X https://x.com/Kimi_Moonshot/status/1987892275092025635
Kimi AMA on K2 Thinking: 1. $4.6M training cost is not an official number 2. Trained on H800s (nerfed H100s) 3. KDA (Kimi Delta Attention) hybrids with NoPE MLA perform better than full MLA with RoPE 4. Muon scales well to 1T parameters. “there are tens of optimizers and”” / X https://x.com/Yuchenj_UW/status/1987940704929395187
Test out Kimi K2 Thinking vs. all the frontier models for yourself at: https://x.com/arena/status/1987947224173781185
Testing Kimi K-2 has reminded me of how insane it is that firms picking AIs are treating them as fungible based on benchmarks Kimi & Grok & Claude & every other model have strengths, quirks & weaknesses that can make a big difference in aggregate Develop your own benchmarks!”” / X https://x.com/emollick/status/1986604851770360213
In our new Expert and Occupational leaderboards: The previous, non-thinking Kimi K2 is ranked #7 for Hard Prompts, particularly excelling in the ‘Legal & Government’ category under the ‘Occupational’ leaderboard, while falling behind in ‘Instruction Following’. Kimi K2 Thinking https://x.com/arena/status/1987947222299013630
k2 vision is happening. this is not a drill. https://x.com/code_star/status/1987917177417289794
Whenever people ask me, “Is Muon optimizer just hype?” I need to show them this. Muon isn’t just verified and used in Kimi; other frontier labs like OpenAI are using it and its variants. It’s also in PyTorch stable now! https://x.com/Yuchenj_UW/status/1987955443420065816
Latest LisanBench results for Kimi-K2 Thinking Kimi-K2 Thinking is the best open-source model and 7th best model overall, right between GPT-5 and GPT-5-Mini Raw Scores: Glicko-2 ratings – better indicator of relative strength Kimi-K2 Thinking managed to set new high-scores https://x.com/scaling01/status/1987952884927934966
🚨 Leaderboard Update! Kimi K2 Thinking by @Kimi_Moonshot has landed on the Text leaderboard as the #2 open source model (MIT modified), tied for #7 overall. These are real-world results. With only a six-point difference with @Zai_org ‘s GLM 4.6, the competition is tight. Kimi https://x.com/arena/status/1987947219224526902
Samsung Vision AI Companion: Bringing Conversational AI to Households Worldwide – Samsung Global Newsroom https://news.samsung.com/global/samsung-vision-ai-companion-bringing-conversational-ai-to-households-worldwide
A trustworthy AI Assistant must be personal, transparent, and have sound judgement. Comet now shows you exactly what it’s doing, lets you determine how it acts, and asks permission before completing sensitive actions. Read our latest blog post: https://x.com/perplexity_ai/status/1989416343331012971
The Story Behind the TIME AI Agent | TIME https://time.com/7332572/the-story-behind-the-time-ai-agent/
Memory in AI agents seems like a logical next step after RAG evolved to agentic RAG. RAG: one-shot read-only Agentic RAG: read-only via tool calls Memory in AI agents: read-and-write via tool calls Obviously, it’s a little more complex than this. I make my case here: https://x.com/helloiamleonie/status/1985376609935769930
Hey, AI Devs! Don’t sleep on the new Gemini File Search API! Feels like the easiest way to build agentic RAG systems. I built a little MCP server to analyze codebases with semantic search (Gemini File Search) & agentic search. Fun chatting with @karpathy’s nanochat project. https://x.com/omarsar0/status/1988236096195776683
A must-read paper → Fundamentals of Building Autonomous LLM Agents Reviews the core cognitive subsystems that make up autonomous LLM-powered agents, including: – Perception – Reasoning & planning: CoT, MCTS, ReAct, Tree-of-Thought (ToT) techniques – Long- & short-term memory – https://x.com/TheTuringPost/status/1984686406430871892
Another banger whitepaper from Google. This time, they discuss context engineering and how to build effective memory for AI agents. Highly recommended read for AI devs. (bookmark it) I think this is an excellent intro on how to think about memory for AI agents. kaggle. https://x.com/omarsar0/status/1989081828678893837
GLM-4.6 from @Zai_org is out now on Together AI! Built for teams deploying agentic AI workflows at scale — this model achieves near-parity with Claude Sonnet 4 while using 15% fewer tokens. https://x.com/togethercompute/status/1989082601399939312
semtools is the easiest way to let your Claude Code / Cursor become an analyst over 1k+ PDF docs. It just adds two CLI commands: `parse`, `search`. Install it to ~/.zshrc and add it to your https://x.com/jerryjliu0/status/1986244321251893631
Use Cases | Claude https://claude.com/resources/use-cases
dropped everything and sprinted to my PC to verify this sadly just anthropic guys vibe coding a little too much”” / X https://x.com/scaling01/status/1989146991272817048
ok one of the things that i’ve always wanted an AIE is coming to pass, after the Great @dylan522p v @jefrankle debate of 2024: the Great MCP debate! @vtahowe and @ianlivingstone are taking on all challengers – if you are a knowledgeable MCP skeptic, come do a live debate next”” / X https://x.com/swyx/status/1988345059675435046
Our very own @RLanceMartin outlined a new playbook for AI engineering on the High Signal Podcast. In this conversation, he touches on: 🔶 Why top products from Claude Code to Manus are constantly re-architecting to keep up with tomorrow’s models 🔶 How to use context engineering https://x.com/LangChainAI/status/1989152093127782765
🌟 Announcing a significant upgrade to the Gemini CLI user experience, making your terminal interactions more robust, intuitive, and visually stable. It’s the same powerful Gemini CLI, just dramatically smoother! See all the new upgrades here → https://x.com/googledevs/status/1989119863961337889
🚀Introducing Code Arena: the next generation of live coding evals for frontier AI models. Built to test how models plan, scaffold, debug, and build real web apps step-by-step. Try Claude, GPT-5, GLM-4.6 and Gemini in Code Arena today! https://x.com/arena/status/1988665193275240616
Small upgrade to Codex, we have updated gpt-5-codex within Codex. The model should feel better, more collaborative, and across the board we gained a few percentage points for the evals we care about. Also a tad more token-efficient, needing ~3% less to achieve similar results.”” / X https://x.com/thsottiaux/status/1986602121572327650
GPT-5.1 is now available in the API. Pricing is the same as GPT-5. We are also releasing gpt-5.1-codex and gpt-5.1-codex-mini in the API, specialized for long-running coding tasks. Prompt caching now lasts up to 24 hours! Updated evals in our blog post.”” / X https://x.com/sama/status/1989048466967032153
Three new OpenAI models are now available in Cursor: 1. GPT-5.1: For everyday tasks like planning and debugging 2. GPT-5.1 Codex: For ambitious coding tasks 3. GPT-5.1 Codex Mini: For cost-efficient changes”” / X https://x.com/cursor_ai/status/1989045849003835460
Four new models available in anycoder GPT-5.1-Instant GPT-5.1 GPT-5.1-Codex GPT-5.1-Codex-Mini https://x.com/_akhaliq/status/1989161892880032132
Rolling out to @code now: GPT-5.1, GPT-5.1-Codex, and GPT-5.1-Codex-Mini”” / X https://x.com/code/status/1989044946058326370
GPT-5.1 meet Droid. Available for all new users of Factory https://x.com/FactoryAI/status/1989052558279864595
.@OpenAI’s GPT-5.1 is live in Cline. After testing the last several weeks, here’s what we’ve found makes GPT-5.1 an exceptional model in Cline: GPT-5.1 is an obsessive researcher. It reads everything before writing anything. Where other models grab context and go, GPT-5.1 https://x.com/cline/status/1989056367030829458
📢 New Model(s) Drop: GPT-5.1 by @openai is live on Yupp! This is a suite of improved models, specialized in enhanced reasoning and logic, high-performance coding, and creative expression. We explored their capabilities with some prompts on Yupp: https://x.com/yupp_ai/status/1989080371775041942
GPT-5.1 and GPT-5.1-Codex are now live in Windsurf All paid users get free access for the next 7 days. GPT-5.1 is also set as the default model for new users. https://x.com/windsurf/status/1989069991770214580
GPT-5.1 is now live in Augment Code. It’s our strongest model yet for complex reasoning tasks, such as identifying and fixing bugs or complex multi-file edits. Rolling out to users now. We’re excited for you to try it! https://x.com/augmentcode/status/1989044026230862008
You can now test upcoming GPT-5.1 on OpenRouter https://www.testingcatalog.com/you-can-now-test-upcoming-gpt-5-1-on-openrouter/
Okay, so far, GPT-5.1 does hit different. I journal into ChatGPT. (So, yeah, really hope the NYT doesn’t get hold of millions of conversations to sift through.) GPT-4o was a great journaling partner: warm, supportive, with good observations, insights, and feedback. But a huge”” / X https://x.com/_simonsmith/status/1988732264516120775
GPT-5.1 is out! It’s a nice upgrade. I particularly like the improvements in instruction following, and the adaptive thinking. The intelligence and style improvements are good too.”” / X https://x.com/sama/status/1988692165686620237
OpenAI Developers on X: “🔧 For complex, long-running tasks: – Use a plan tool: break work into milestones with statuses (pending/in-progress/done). – Encourage persistence: “carry it through implementation, verification, explanation” rather than stopping early. – If no reasoning is needed, choose” / X
https://x.com/OpenAIDevs/status/1989378875126886560
OpenAI Developers on X: “🚀 Final tip: Treat the prompt as a living document. Test, iterate, refine. Small changes in phrasing or structure often lead to big gains. Check out our Prompt Optimizer tool to apply these best practices to GPT-5.1: https://t.co/T4S3mSDFdf” / X
https://x.com/OpenAIDevs/status/1989378876922077560
I’ve got you, Ron — that’s totally normal, especially with everything you’ve got going on lately.”” Who actually wants their model to write like this? Surprised OpenAI highlighted this in the GPT-5.1 announcement. Very annoying IMO. https://x.com/tamaybes/status/1988715705722892371
Great new capability in Databricks powered by our AI research team! We trained a document parsing system that delivers leading quality at 3-5x lower cost and outperforms leading VLMs like GPT-5 and Claude. This is critical to connect AI to so many kinds of data. https://x.com/matei_zaharia/status/1988325177193885885
You can now get more Codex usage from your plan and credits with three updates today: 1️⃣ GPT-5-Codex-Mini — a more compact and cost-efficient version of GPT-5-Codex 2️⃣ 50% higher rate limits for ChatGPT Plus, Business, and Edu 3️⃣ Priority processing for ChatGPT Pro and”” / X https://x.com/OpenAIDevs/status/1986861734619947305?s=20
GPT-5.1 is now live in Warp. It’s much faster (40% faster task completion on a subset of SWE-bench Verified) without compromising quality. GPT-5.1 is available to all Warp users, and is now the default model for all new users. https://x.com/warpdotdev/status/1989049715837829326
GPT-5.1 is now available for Perplexity Pro and Max subscribers. https://x.com/perplexity_ai/status/1989075483385069949
I gave codex a markdown to keep track of progress and let it chirp away on a massive linter debt, and it worked all night and fixed around ~6000 linter/type issues. (it would stop but I queued a massive amount of continue’s to keep it working) Part of my prompt was to google whenever it’s stuck and always update the tracker file when it learn sth new. This seems to have worked… it’s still working. https://x.com/steipete/status/1986799989775810955
xAI works on Grok Code Remote to rival OpenAI’s Codex https://www.testingcatalog.com/xai-working-on-grok-code-remote-to-rival-openai/
🚀 Qwen DeepResearch 2511 is LIVE! 🚀 We’ve just dropped a major upgrade, making your research deeper, faster, and smarter! 🔗: https://x.com/Alibaba_Qwen/status/1989026687611461705
🚀 Qwen Code v0.2.1 is here! We shipped 8 versions(v0.1.0->v0.2.1) in just 17 days with major improvements: What’s New: 🌐 Free Web Search: Support for multiple providers. Qwen OAuth users get 2000 free searches per day! 🎯 Smarter Code Editing: New fuzzy matching pipeline https://x.com/Alibaba_Qwen/status/1989368317011009901
QwenEdit-2509 Photo2Anime: LoRA transforms photos into anime; delivers better results than prompting for “”anime”” without it. https://x.com/wildmindai/status/1988309389259010112
New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14) https://x.com/EkdeepL/status/1989009095953895756
At AI Dev 25 x NYC, @ozenhati (Head of Developer Relations, @GroqInc) showed how compound AI systems can build deep-research agents with a single API call. She walked through how agents choose tools, reason over results, and loop until they reach an answer — and why latency https://x.com/DeepLearningAI/status/1989431887224275433





Leave a Reply