Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: High-end product photograph of a long ceramic tasting boat lined with six mini soft-serve sundae curls, each topped with a different global flavor cue — matcha dust, dulce de leche, pistachio, mango ribbon, stroopwafel shard, hibiscus syrup — tiny paper flag toothpicks in each scoop, a passport-style banner reading ‘INTERNATIONAL’ in bold DQ-red lettering arched across the front, a small folded napkin printed ‘Est. 1951 — Milford, DE’, soft directional studio light, glossy macro detail, shallow depth of field, landscape composition, nostalgic 1950s Americana counter aesthetic.

How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview.
https://x.com/AnthropicAI/status/2049927618397614466

DeepSeek-V4 Pricing gives you glimpses into the future Imagine in one year using a Mythos level model that can basically code everything for $4/million tokens
https://x.com/scaling01/status/2047707820552831028

You can now run DeepSeek4-Flash on 256GB Mac. Next up speed 🚀 PR:
https://x.com/Prince_Canuma/status/2047685898163147125

.@deepseek_ai v4 Pro’s checkpoint is both in FP4 and FP8, depending on the layer. This means that the entire model can fit on a single NVIDIA 8xB200 node without trouble. @vllm_project: “”Checkpoint is FP4+FP8 mixed: MoE expert weights are stored in FP4 while the remaining
https://x.com/LambdaAPI/status/2047654086263320965

Thoughts after reading the DeepSeek V4 paper: – NVIDIA really is something else. Remember how back in 2024 people were bashing Blackwell as overspec’d and dismissing FP4 as just marketing? Turns out it was all groundwork for the next generation of models. Maybe NVIDIA’s moat is
https://x.com/jukan05/status/2047861732702662741

✨ DeepSeek-V4 is here — a million-token context, 1.6T parameter powerhouse optimized for agentic workflows. Out of the box, on DeepSeek-V4-Pro, NVIDIA Blackwell Ultra delivers over 150 TPS/user interactivity for agentic workflows. And we’re just getting started. Expect these
https://x.com/NVIDIAAI/status/2047765637808664759

It really seems like the US has 3 frontier companies and a horde of low-skill wrappers and cloud providers; without WarClaude, WarGPT and WarGemini, the state would be naked. Something of a Russian situation. I think China has more companies that could do it than big clusters.
https://x.com/teortaxesTex/status/2047835420755415472

China blocks Meta’s $2 billion takeover of AI startup Manus
https://www.cnbc.com/2026/04/27/meta-manus-china-blocks-acquisition-ai-startup.html

China blocks Meta’s $2B Manus deal after months-long probe | TechCrunch

China blocks Meta’s $2B Manus deal after months-long probe

GPT Image 2 x Seedance 2.0 x Magnific It’s crazy how you can turn a shower thought into a realistic cinematic clip! ⬇️the workflow I used blew:
https://x.com/_OAK200/status/2047616640448078167

🎉 Day-0 vLLM support for the MiMo-V2.5 series! Congrats to @XiaomiMiMo on the open-source release of the MiMo-V2.5 and MiMo-V2.5-Pro. Highlights from the flagship MiMo-V2.5-Pro, an agent-oriented model focused on long-horizon tool use and frontier coding: – Long-horizon task
https://x.com/vllm_project/status/2048825703244972375

Just dropped two open-source models: MiMo-V2.5-Pro (Code Agent, 1T total) and MiMo-V2.5 (Multimodal Agent, 310B total). Oh and one more thing — we’re giving devs & creators 100T tokens on us. Go build something cool 🛠️ 🎁 100T Free Token Grant for Builders
https://x.com/_LuoFuli/status/2048851054662762618

MiMo-V2.5-Pro | Xiaomi
https://mimo.xiaomi.com/mimo-v2-5-pro

Xiaomi MiMo-V2.5 is now officially open-sourced! MIT License, supporting commercial deployment, continued training, and fine-tuning – no additional authorization required. Two models, both supporting a 1M-token context window : • MiMo-V2.5-Pro: built for complex agent and
https://x.com/XiaomiMiMo/status/2048821516079661561

Xiaomi MiMo-V2.5 Series: Pushing Open-Source Agents Forward 🔸 MiMo-V2.5-Pro, our strongest model yet. A major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks, now matching frontier models like Claude Opus 4.6 and
https://x.com/XiaomiMiMo/status/2046988157888209365?s=20

Under the directives of the President of the UAE, we launch a new government model. Within two years, 50% of government sectors, services, and operations will run on Agentic AI, making the UAE the first government globally to operate at this scale through autonomous systems. AI
https://x.com/HHShkMohd/status/2047277766769545352?s=20

Remote agents in Vibe. Powered by Mistral Medium 3.5. | Mistral AI
https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

Tencent has released Hy3-preview, an open weights reasoning model scoring 42 on the Artificial Analysis Intelligence Index, trailing recent open weights peers Hy3-preview is the latest model from @TencentHunyuan. It is a 295B total / 21B active parameter Mixture-of-Experts
https://x.com/ArtificialAnlys/status/2049852417316143393

Yesterday, we shared a chart showing 80% of Claude users live in $100k+ households, more than any other major AI service. But Claude’s user base is smaller than other AI services, so this isn’t the same as being the most popular service among high-income households.
https://x.com/EpochAIResearch/status/2047423836904460328

Scores I would like to see from DeepSeek-V4 to confirm it being less than 6 months behind frontier models ARC-AGI-1: ~75% ARC-AGI-2: ~35% GSO: ~26% METR: 4.5-5 hours WeirdML: ~63% basically Opus 4.5 / GPT-5.2 scores
https://x.com/scaling01/status/2047686712051048598

A few more notes on DeepSeek-V4: – it seems to be a ~GPT-5.2/Opus 4.5+ tier model, so they are still ~4-5 months behind the frontier, but ahead of other chinese labs, with Kimi K2.6 being closest – at 1.6T params they now have a model that’s in the same weight class as GPT-5.4
https://x.com/scaling01/status/2047618271310926151

DeepSeek-V4 is definitely better than GLM-5.1 but not quite Opus 4.7, GPT-5.4 or Gemini 3.1 Pro level unfortunately this video had no comparison to Kimi-K2.6
https://x.com/scaling01/status/2047733998714052819

Mistral Medium 3.5 is interesting less for the benchmarks and more for the positioning. Look at who they’re comparing against: Kimi, Qwen, GLM, Claude (Sonnet). Not GPT, not Gemini. And i dont mean that in a negative way! With Aleph Alpha being acquired by Cohere last week,
https://x.com/kimmonismus/status/2049545016784413005

1.6T MoE chad vs 128B dense normie insane price-performance mog
https://x.com/scaling01/status/2049546078664397105

anon do you realize that V4-Pro is straight up the strongest pretrained model we have? Like… 1.6T@49AB (≈280B dense), 33T – even by meme formula it’s > LLaMA 3. Add Muon, mHC, most steps 64K context + extended to 1M… No excuses now. Every “”unicorn”” can have its brand AGI.
https://x.com/teortaxesTex/status/2047630981364883816

DeepSeek-V4 dropped. 1M context. 10x smaller KV cache. First open model where the context window and the agentic post-training meet.
https://x.com/ben_burtenshaw/status/2047646980139016560

Not much detail about the pretraining data unfortunately beyond the standard math, code, webpages etc. Also they use 32T tokens with a total parameter size of 1.6T. That works out to 20 tokens per parameter. Wait a minute….
https://x.com/nrehiew_/status/2047666048334450754

@NousResearch absolutely crushing the 0-day support! Deepseek-v4-pro is live in the Nous Portal 😍 If you want a real personal agent/assistant/quant/researcher/artist/coworker, Hermes Agent continues to deliver!
https://x.com/mr_r0b0t/status/2047673600900010044

🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean’s Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain
https://x.com/vllm_project/status/2049503979898274163

📊 Day 0 performance is here: DeepSeek-V4-Pro running on NVIDIA Blackwell Ultra. Using @vllm_project’s Day 0 recipe, we’ve captured the initial performance Pareto for DeepSeek’s flagship 1M long-context model. This curve highlights the baseline for balancing AI factory
https://x.com/NVIDIAAI/status/2047823093578518758

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params.
https://x.com/deepseek_ai/status/2047516922263285776?s=20

🚨 DeepSeek V4 Pro just dropped 75% OFF API pricing + permanent cache price cut to 1/10! 🔥 4/26 Update: Cache permanently 90% cheaper Offer ends May 5, 2026 Insights from Zhihu contributor 普杰 💡 Key Insights: • DeepSeek doesn’t do loss-leader promotions → ¥3 in / ¥6 out ≥
https://x.com/ZhihuFrontier/status/2049027925920637077

8x VLLM CUDA MOAT ALERT: InferenceX has added @deepseek_ai V4 Pro for @vllm_project for day 3 performance across B200, B300, H200, GB200 disagg. We are seeing that B300 is up to 8x faster than H200. The team is working on benchmarking vLLM 0.20 which has the new DeepGEMM MegaMoE
https://x.com/SemiAnalysis_/status/2048957715955765284

Also, deepseek v4 is available as well
https://x.com/Teknium/status/2047798102091067677

And now a new DeepSeek model, and appears to be fully open weights. Good benchmarks, but with open models, that isn’t always as meaningful. Should be live soon to actually try.
https://x.com/emollick/status/2047516272062058890

Another reason I’m watching Delton closely is that the company works closely with Huawei. As DeepSeek’s comments suggest, Huawei’s 950 is expected to enter heavy mass production starting in the second half of this year, right?
https://x.com/jukan05/status/2047823601462812932

Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?
https://x.com/simonw/status/2047844236142497850

Compressed Sparse Attention. A Faithful Implementation of CSA from the DeepSeek-V4 paper.
https://x.com/arjunkocher/status/2049066844925936041

DeepSeek cuts V4-Pro prices by 75%
https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent

DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index @deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture
https://x.com/ArtificialAnlys/status/2047735160544841953

DeepSeek removed it’s “Thinking with Visual Primitives” repo. here a paper link if anyone needs to read it.
https://x.com/arjunkocher/status/2049875566678118898

DeepSeek said Pro pricing could fall sharply once Huawei Ascend 950 supernodes are ​deployed at scale in the second half of the year””
https://x.com/scaling01/status/2047760776769720360

DeepSeek staff has deleted the repo and all mentions of the vision paper. What the hell happened? People who got Vision enabled on web: do you still have it?
https://x.com/teortaxesTex/status/2049880056420298995

DeepSeek themselves estimate the gap to be 3-6 months I think it’s on the higher end of that range
https://x.com/scaling01/status/2047626000091971811

DeepSeek trains vision capabilities into their v4 Flash model by having the model directly output bounding boxes and point coordinates of an image during reasoning. This is DeepSeek’s Computer Use Agent.
https://x.com/nrehiew_/status/2049840778491662623

DeepSeek v4 earmarks the next era of open weight models and is one of the landmark papers for open weight model training. Thread and notes below 🙂
https://x.com/nrehiew_/status/2047665987730993363

DeepSeek V4 just launched on Huawei hardware, and the numbers tell a story the headlines are hiding. • Huawei’s Ascend 910C delivers roughly 60% of the inference power of an Nvidia H100. • Production is capped at 750,000 units this year; Nvidia ships that many in a single
https://x.com/PalwinderCFA/status/2047614823102619974

DeepSeek V4 MLX Quants now on MLX community HF repo, Made possible by @LambdaAPI and @TheZachMueller ❤️ Without a GPU cluster it would take me a week to upload the quants… Model collection 👇🏽
https://x.com/Prince_Canuma/status/2047847095466385899

DeepSeek V4 Open Source + vLLM Support LIVE 🚀 | Technical Breakdown 🧠 Core Insight DeepSeek V4 is built to solve 1M-token long-context inference — the biggest pain point for LLMs today. ⚠️ 2 Key Long-Context Challenges • KV Cache Explosion: KV cache grows linearly with
https://x.com/ZhihuFrontier/status/2047664976215839021

DeepSeek writing quality (at least in Chinese) is good because they’ve been obsessing about data for the entire history of the company (tbh “”clean data”” is an obvious instinct for algo traders too, but I think this is more about Wenfeng’s purism) and have such job listings
https://x.com/teortaxesTex/status/2047614729145745623

DeepSeek_V4.pdf · deepseek-ai/DeepSeek-V4-Pro at main

Click to access DeepSeek_V4.pdf

DeepSeek-V4 is a full-stack redesign of LLMs around long context + efficiency Here are some of the changes: – Hybrid attention: Compressed Sparse Attention (CSA) + Heavily Compressed Attention (HCA) for long-context efficiency – 1M-token context becomes ~3-10× cheaper in memory
https://x.com/TheTuringPost/status/2048566818118545887

DeepSeek-V4 uses our Hash routing approach developed back in 2021 — see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced ‘looped transformers’, but we called that staircase & ladder (see
https://x.com/jaseweston/status/2047690308217926055

DeepSeekv4 Pro 1.6T is supported on InferenceX on Day 0! We have already gotten H200 vLLM working and working on @vllm_project & @sgl_project MI355, B200, B300, GB200/300 disaggregated DeepSeekv4 day 0 performance benchmarking too to track the progress of improvement. Thank you
https://x.com/SemiAnalysis_/status/2047726025748930687

Early DeepSeek v4 impressions not great.
https://x.com/mbusigin/status/2047707082007220393

Here’s DeepSeek v4 Pro. Added to the playable gallery as well.
https://x.com/emollick/status/2047527060713664754

I get the impression many Chinese hate Huawei irrationally and suspect it of a conspiracy to deprive DeepSeek of based American chips
https://x.com/teortaxesTex/status/2047631470664020211

I hear similarly it’s not unique to Mythos/5.5 ofc, frontier models have been dealing with >100T for a while, as far as I know. We see even the open source models get close to 50T. A 100T DeepSeek V4 is just V4 + 2 more epochs, 3e25 FLOPs. still below Llama 405B level
https://x.com/teortaxesTex/status/2049830477167526255

I hope the upgrade to DeepSeek v4 will make the bot comments on here more bearable.
https://x.com/emollick/status/2047519187287846937

I’m still confused by some of the decisions done in deepseek v4 Main confusion is why the huge focus on reducing KV cache size when with something like HiSparse u can offload most of ur kv cache (making ur decode compute bound) This also is compensated with a huge 128 heads and
https://x.com/Grad62304977/status/2048785005216723072

interesting that deepseek’s also joined the path of not allowing sampler control on their api. i wonder why and how long this has been there
https://x.com/stochasticchasm/status/2047717161070989499

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows.
https://x.com/togethercompute/status/2047743446522224987

its so messed up that deepseek trained on deepseek reasoning traces. has chinese distillation gone too far?
https://x.com/kalomaze/status/2047762970931827125

Jensen was making a good point, but now it’s too late. DeepSeek is fully committed to ditching CUDA. The rest of the Chinese xiaoren ecosystem can be swayed by Hoppers; Wenfeng believes too much in long-termism. After V4, non-CUDA hardware is guaranteed to live and prosper.
https://x.com/teortaxesTex/status/2049185408785998217

Let’s dive deeper into the difference between DeepSeek V4 Pro & V4 Flash by @DeepSeek_AI. – Both support 1M token context and V4 Flash Thinking shifts the price Pareto frontier. V4 Pro ranks ~30 places higher than the V4 Flash variants, but costs 12x more at launch pricing.
https://x.com/arena/status/2047774037204742255

Let’s see DeepSeek are all nice folks and China’s national heroes, Xi is personally a man of integrity, and they’re not starting wars. American society firebombs Sam Altman, Ant is a weird sex cult, and elected US leader is a murderous monke. Why should compute decide this?
https://x.com/teortaxesTex/status/2047645676234846459

looks like the ~Opus 4.5 estimate for DeepSeek-V4 holds for now, at least on SimpleBench
https://x.com/scaling01/status/2047682465624445015

My first two TiKZ Sparks unicorns from DeepSeek v4. (Expert mode, from the DeepSeek site, which is supposed to be v4 Pro according to the release)
https://x.com/emollick/status/2047523193481547929

My quick paper summary: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) Two new compressed attention mechanisms for long context manifold hyper connections Muon training 32T tokens FP4 Quantization-Aware
https://x.com/iscienceluvr/status/2047514399393579235?s=46

not even DeepSeek has any appetite for doing this again this is evolution tier architecture they’ll refactor it when they get some time
https://x.com/teortaxesTex/status/2047648219081974034

somewhere in france, still awake at sunrise, adding exclamations to their first read of the deepseek technical report, “one of the best i’ve ever read”
https://x.com/morqon/status/2047643246923325833

Surprisingly a lot of info about the data and process (which is unlike some other deepseek papers). On first read, it sounded like they only cared about specific tasks rather than a general multimodal model. On second thought however, I realized these “”visual primitives”” and
https://x.com/nrehiew_/status/2049840802562740311

TEHRAN, April 29, 2026 — Less than a week after the release of @deepseek_ai DeepSeek v4 Pro, the cracked team at @vllm_project and @inferact has achieved considerable improvement on GB200 (Dynamo+vLLM). This is largely due to the release of vLLM 0.20.0, which comes with MegaMoE
https://x.com/SemiAnalysis_/status/2049578313111216271

Thank you @NVIDIAAI for highlighting vLLM’s day 0 @deepseek_ai support and enhancing the open source inference ecosystem!
https://x.com/vllm_project/status/2047843293447500069

The strongest open-source agentic model is live on Baseten! DeepSeek V4 is a preview of two powerful MoE models: V4-Pro (1.6T params) and V4-Flash (284B params) with 1M context and SOTA open-source performance. This represents a significant jump from V3.2 (which had a 128k
https://x.com/baseten/status/2047779549644243146

This is great – @deepseek_ai V4 supports prefill! 😀 Most other providers have been dropping support for this critically important capability, so wonderful to see at least one company stepping up.
https://x.com/jeremyphoward/status/2049098509530583199

Unless I’m doing it wrong, Kimi K2.6 in Hermes is like 7x slower than DeepSeek V4, not to mention V4-Flash lmao but it can sometimes fix bugs that not even Pro can resolve. it also has some harsh words for them:
https://x.com/teortaxesTex/status/2048820805258059837

vLLM support for DeepSeek V4 base models is on the way! The V4 release includes 4 models: base/instruct × flash/pro. Initial support covers the instruct versions. To extend support to the base models, we worked with @deepseek_ai to add an expert_dtype field in the config, making
https://x.com/vllm_project/status/2048769886483329525

vLLM v0.20.0 is here! 752 commits from 320 contributors (123 new). 🎉 Highlights: DeepSeek V4, Hunyuan v3 preview support, CUDA 13 / PyTorch 2.11 / Transformers v5 baseline, FA4 as default MLA prefill, TurboQuant 2-bit KV (4× capacity), vLLM IR foundation. Thread 👇
https://x.com/vllm_project/status/2048918629144805619

🚀 Sovereign AI for the world. Cohere & Aleph Alpha form transatlantic AI powerhouse anchored in Canada & Germany! Combining our global scale with European R&D excellence to build sovereign, enterprise-grade AI. Security, privacy & trust for businesses & governments worldwide.
https://x.com/cohere/status/2047631725426000268

Competition between Chinese labs is intensifying. Top 3 Open Models in Text Arena are all now sitting just below the top proprietary tier, each leading different real-world categories. All category rankings are based on overall, which includes proprietary models. #1 open
https://x.com/arena/status/2047714237502677405

Germany is the world’s third largest economy after the US and China, and the economic powerhouse of Europe. We are thrilled to have Canada and Germany as deep strategic partners at our back! 🇨🇦 🇩🇪
https://x.com/aidangomez/status/2047651054381052086

Sovereign AI for the World: Cohere and Aleph Alpha to Form Global AI Powerhouse as Nations and Enterprises Demand Control Over Their Technology
https://www.businesswire.com/news/home/20260424174908/en/Sovereign-AI-for-the-World-Cohere-and-Aleph-Alpha-to-Form-Global-AI-Powerhouse-as-Nations-and-Enterprises-Demand-Control-Over-Their-Technology

I wonder how many of the complaints of distillation from the chinese is actually then just trying to benchmark your garbage to figure out if they mog you or not
https://x.com/yacineMTB/status/2047628416514486661

Mistral Medium 3.5 is out and it’s a dense 128B model
https://x.com/scaling01/status/2049508126081077678

new mistral model: 128B dense with an arch from 3 years ago (llama 2), very low context (128k), priced higher than deepseek v4 pro (1.6T total params, 1M context) and every other oss model that outperforms it this is very sad
https://x.com/eliebakouch/status/2049523829358162027

Tbh Mixtral’s MoE form was downstream of sparse upcycling and it was a… reasonable hack at the time, but it’s so grug-brained that I proposed it well in advance and well-informed people (@main_horse) were like “”meh”” (yes I’m still smug about it) DSMoE sure changed everything
https://x.com/teortaxesTex/status/2047844368883581404

🤖 : Moonshot AI open-sources Kimi K2.6, a coding and long-horizon agent model that scales agent swarms to 300 concurrent sub-agents across 4,000 coordinated steps.
https://x.com/dl_weekly/status/2048764506105348129

I have said this before for Kimi-K2.6, but this is also valid for DS-V4 Chinese labs are already in or very close to take-off. Meaning they have models that are useful in the development of new models. (but of course their exponential is shifted 5+ months, so they are
https://x.com/scaling01/status/2047625331339661685

Kimi K2.6 is now #1 on OpenRouter’s weekly LLM Leaderboard 🏆 A huge thank you to every developer building with Kimi. We’ll keep our heads down and keep shipping.
https://x.com/Kimi_Moonshot/status/2048693682329776223

(tbh I boringly feel that the truth is in the middle. K2.6 and GLM 5.1 are more polished, even though they have lesser intrinsic capacity. I can’t bring myself to claim that V4 is straight up decisively stronger than them in most coding)
https://x.com/teortaxesTex/status/2047616897256947967

Mistral releases Mistral Medium 3.5, a new vision reasoning model. 🔥 Mistral-Medium-3.5-128B offers highly competitive performance for models 6x its size. Run locally on ~64GB RAM. Guide:
https://t.co/ztAVzgJECr GGUFs:
https://x.com/UnslothAI/status/2049511248623256017

CodexBar 🎚️ 0.23 is out: Mistral support, Claude Designs/Daily Routines usage, Cursor Extra usage, GPT-5.5 pricing, cleaner widgets/menus, and a bunch of reliability fixes.
https://x.com/steipete/status/2048252455817785357

MiMo-V2.5 Pro by @XiaomiMiMo is the #11 model (#3 among open) in Code Arena: Frontend WebDev and has shifted the Pareto frontier with $1 input / $3 output per MToken.
https://x.com/arena/status/2049582973926949116

SGLang and vLLM support for the MiMo-V2.5 series is here. 🙌 Huge thanks to SGLang project from @lmsysorg and @vllm_project for moving fast and helping developers get started with MiMo-V2.5 on day zero.
https://x.com/XiaomiMiMo/status/2048821520798302409

xiaomi mimo v2.5 eval card, pro is 1T total 42B active, omni (video/image/audio) is 310B total 15B active, both have 1M context support they train in FP8, 27T tokens for pro and 48T for the smaller variant. interleaved SWA with an aggressive 6:1 ratio and 128 window size, still
https://x.com/eliebakouch/status/2048845602633433258

Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Moonshot’s Kimi K2.6 – the current top open weights model. MiMo V2.5 Pro’s weights are expected to be released soon, which would make MiMo V2.5 Pro the first equal open weights model
https://x.com/ArtificialAnlys/status/2047799218828665093?s=20

🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang. ⚡ 2-3× forward speedup. 2× backward speedup. 💻 Purpose-built for agentic AI on your personal devices. 💡Key insights: 1. Gate-driven automatic intra-card CP. 2. Hardware-friendly algebraic
https://x.com/Alibaba_Qwen/status/2049462666734026923

$3/million output tokens. Qwen 3.5 Plus is basically a frontier model. Let that sink in.
https://x.com/MatthewBerman/status/2049562998575075526

Alibaba’s Qwen3.6 27B is the new open weights leader under 150B parameters scoring 46 on the Artificial Analysis Intelligence Index, but uses ~3.7x the output tokens and costs ~21x more than Gemma 4 31B (39) to run the full Intelligence Index @Alibaba_Qwen has released two open
https://x.com/ArtificialAnlys/status/2049881951260283097

Pi + local models are definitely really cool! Short demo to clean up my Desktop: > terminal 1: llama-server -hf unsloth/Qwen3.5-9B-GGUF:UD-Q4_K_XL > terminal 2: simply type “”pi”” and start talking to it
https://x.com/NielsRogge/status/2049128153658839324

Qwen
https://qwen.ai/blog?id=qwen-scope

Qwen 3.6 Flash
https://x.com/scaling01/status/2048730112636473792

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify &
https://x.com/Alibaba_Qwen/status/2049861145574690992

This is where we are right now. And i’m not gonna lie it feels pretty magical 🧚‍♀️ Qwen3.6 27B running inside of Pi coding agent via Llama.cpp on the MacBook Pro For non-trivial tasks on the @huggingface codebases, this feels very, very close to hitting the latest Opus in Claude
https://x.com/julien_c/status/2047647522173104145

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading