Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: A cinematic globe of Earth in Wicked movie style with emerald green and deep purple tones, continents precisely segmented into glowing puzzle-piece sections with visible boundaries, floating in dramatic theatrical spotlight against moody stage background with subtle yellow brick road patterns, the word INTERNATIONAL overlaid as bold movie title text
We are now seeing the first long-anticipated use of AI for semi-autonomous cyberattacks. “”This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement”” https://x.com/emollick/status/1989045906977747200
Gemini 3 Pro Preview has comparable speeds to Gemini 2.5 Pro, with 128 output tokens per second. This places it ahead of other frontier models including GPT-5.1 (high), Kimi K2 Thinking and Grok 4 https://x.com/ArtificialAnlys/status/1990813128226189811
The Artificial Analysis leaderboard shows Gemini 3 at 73%, GPT-5.1 at 70%, and Kimi at 67% – minor differences. On our leaderboard, Gemini is 47%, GPT-5.1 is 38%, and Kimi is 27% – Gemini 3 is substantially more capable on hard benchmarks. https://x.com/hendrycks/status/1991188104804208736
We estimate that Kimi K2 Thinking has a 50%-time-horizon of around 54 minutes (95% confidence interval of 25 to 100 minutes) on our agentic SWE tasks. Note that we conducted this evaluation through a third-party inference provider, which reduces our confidence in this estimate. https://x.com/METR_Evals/status/1991658241932292537
Kimi K2 Thinking is impressive. So I built a multi-agent deep researcher, Kimi Deep Researcher. It generates long research reports on any topic, powered by subagents (web searcher, analyzer, and synthesizer). It can do 100s of tool calls per session. Repo soon! https://x.com/omarsar0/status/1988974710592516454
🤗 Kimi-k2-Thinking has reached top performance on the latest IMO-level reasoning benchmark, AMO-Bench from Meituan Longcat!”” / X https://x.com/Kimi_Moonshot/status/1991139250566545886
Kimi-K2 Thinking gets the same score on METR as Claude 3.7 Sonnet as I was saying, open-source is 9 months behind frontier labs on agentic, long-context reasoning tasks it’s still an improvement and open-source models seem to be on their own exponential, but I heavily suspect https://x.com/scaling01/status/1991665386513748172
Open-source research agents have been lagging behind proprietary systems like OpenAI’s Deep Research. The gap has been frustrating for developers who want powerful, deep research agents without vendor lock-in. I’ve been building my own called Kimi Deep Researcher. Similarly, https://x.com/omarsar0/status/1990794651608219727
If you work with robotics, AV, or 3D vision, this update will save you months of engineering. Most models need complex engineering to get reliable 3D geometry. This one does it with a plain transformer. Depth Anything 3 is the new model from @BytedanceTalk that predicts stable, https://x.com/IlirAliu_/status/1989622721366446190
Depth Anything 3 proves most 3D vision research has been overengineering the problem. Vanilla DINOv2 transformer + depth-ray pairs crushes SOTA by 44% on pose, 25% on geometry. One approach for SOTA monocular depth, multi-view geometry, pose estimation, and novel view synthesis”” / X https://x.com/bilawalsidhu/status/1989444908357488832
ByteDance-Seed/Depth-Anything-3: Depth Anything 3 https://github.com/ByteDance-Seed/Depth-Anything-3
Depth Anything 3 is here! It’s a beefy one! https://x.com/Almorgand/status/1989370456131215514
After a year of team work, we’re thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 https://x.com/bingyikang/status/1989358267668336841
supervision-0.27.0 lets you parse and visualize Qwen3-VL object detection results prompt: person between albert and marie. to answer this, Qwen3 needs prior visual knowledge of Marie Curie and Albert Einstein, and it needs to understand reference terms like between. https://x.com/skalskip92/status/1990433442434031737
Grok goes Global with KSA: Announcing our landmark partnership with Saudi Arabia and @HUMAINAI–the first time a country adopts Grok at scale. xAI will build a new generation of hyperscale GPU data centers in the Kingdom, deploying Grok nationwide. https://x.com/xai/status/1991224218642485613
HUMAIN and xAI Partner to Build Next-Generation AI Compute Power and Deploy Grok in the Kingdom to Support the ‘Most AI-Enabled Nation’ Objectives https://www.humain.com/en/news/humain-and-xai-partner-to-build-next-generation-ai-compute-power-and-deploy-grok-in-the-kingdom-to-support-the-most-ai-enabled-nation-objectives
Grok goes Global with KSA | xAI https://x.ai/news/grok-goes-global
Excited to announce Sakana AI’s Series B! 🐟 From day one, Sakana AI has done things differently. Our research has always focused on developing efficient AI technology sustainably, driven by the belief that resource constraints-not limitless compute-are key to true innovation. https://x.com/hardmaru/status/1990204623471395284
Sakana AI takes crown as Japan’s most valuable unicorn – Nikkei Asia https://asia.nikkei.com/business/technology/artificial-intelligence/sakana-ai-takes-crown-as-japan-s-most-valuable-unicorn
DeepSeek is back with a new open source repository. “”A parallel load balancer that optimizes workload distributions for MoE models”” https://x.com/scaling01/status/1991067602467131704
Sakana AI、シリーズBラウンドの資金調達を発表 https://x.com/SakanaAILabs/status/1990212217216880829
サカナAI、金融軸に防衛用途も開拓へ 米CIA系も技術力評価 https://x.com/nikkei/status/1990296670203195623
Sakana AI raises $135M Series B at a $2.65B valuation to continue building AI models for Japan https://x.com/TechCrunch/status/1990388003525787710
Sakana AI raises $135M Series B at a $2.65B valuation to continue building AI models for Japan | TechCrunch https://techcrunch.com/2025/11/17/sakana-ai-raises-135m-series-b-at-a-2-65b-valuation-to-continue-building-ai-models-for-japan/
Perplexity Pro and Max subscribers now have access to Kimi-K2 Thinking and Gemini 3 Pro. https://x.com/perplexity_ai/status/1991614227950498236
🚨Leaderboard Update New model provider in the Arena: @DeepCogito has released Cogito v2.1 (MIT licensed) 🔹Top 10 Open Source Model for WebDev, rank #10 🔹Tie ranks #18 overall for WebDev This puts Cogito v2.1 on par with community favorites like Qwen 3 Coder Plus & Kimi K2 https://x.com/arena/status/1991211903331496351
As promised, Kimi K2 and Gemini 3 Pro are available for all Perplexity Pro and Max users. Grok 4.1 will be available soon.”” / X https://x.com/AravSrinivas/status/1991619527638151665
Excited to share another milestone! Our newly released ERNIE-5.0-Preview-1120 has entered the @arena Vision Leaderboard for the very first time! It lands straight in the Top 15 with a score of 1206, on par with Claude Sonnet 4 and GPT-5-high! 🚀 ERNIE-5.0 is natively https://x.com/ErnieforDevs/status/1991898146981789718
I love Qwen3-VL but for some reason 2B model blows up 80GB VRAM on simple SFT (NF4 QLoRA with flash installed)”” / X https://x.com/mervenoyann/status/1990172603437175147
10,000,000 users creating with Qwen Chat — and we’re just getting started. From here, let’s begin — https://x.com/Alibaba_Qwen/status/1990322403994657091
This weekend I evaluated the latest Qwen3-VL models for semantic object detection and built a HF Space to compare Qwen3/2.5/2 side by side. 👉 https://x.com/darius_morawiec/status/1990225022766719335
Today, we are officially open-sourcing a set of high-quality speculator models on the @huggingface Hub. Our first release includes Llamas, Qwens, and gpt-oss. In practice, you can expect 1.5-2.5× speedups on average, with some workloads seeing more than 4× improvements! https://x.com/_EldarKurtic/status/1991160711838359895





Leave a Reply