Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Photorealistic 4K wide shot of a giant translucent ice globe partially submerged in a frozen winter bay at dusk, continents carved in relief on its surface, dramatic cracks glowing with warm sunset light from within, surrounded by ice chunks with tiny carved national symbols floating in dark water, blue-to-orange gradient sky reflected in still water patches, natural documentary cinematography, physically grounded, landscape format with bold sans-serif INTERNATIONAL text overlaid

SeeDance 2 will the be the DeepSeek moment for T2video.”” https://x.com/kimmonismus/status/2021145731319398887

Seedance 2.0 https://seed.bytedance.com/en/seedance2_0

A New AI Video Model From ByteDance is Making Waves | PetaPixel https://petapixel.com/2026/02/09/bytedance-seedance-2-ai-video/

Bytedance shows impressive progress in AI video with Seedance 2.0 https://the-decoder.com/bytedance-shows-impressive-progress-in-ai-video-with-seedance-2-0/

ByteDance’s Seedance 2.0: “”Monica’s apartment from the show Friends, except all of the friends are otters wearing wigs. The otter with a Rachel wig says “”Is anything weird”” and the one with a Joey wig says “”Nope, all is normal”””” Huh.”” https://x.com/emollick/status/2021411069865099764

Example of Seedance with some consistency issues, but still: “”Action sequence shot for a big budget action movie where two elegantly dressed woman on giant snails race slowly around a track as gunners on the snails fire at each other. Lots of quick cuts and action movie cliches”””” https://x.com/emollick/status/2021432517992280127

I literally cannot get enough of those clips. SeeDance solved the touring-test for text2video.”” https://x.com/kimmonismus/status/2021605142580412558

Runway’s $5.3B valuation fuels world models | The Deep View https://www.thedeepview.com/articles/runway-s-usd5-3b-valuation-fuels-world-models

seedance 2.0 has passed the uncanny valley for me it’s so good, i wanna see what kind of dataset is it trained on?”” https://x.com/maharshii/status/2021549823321886755

SeeDance 2.0: “”An anime where an otter goes into a large mech, with lots of quick shots of mechanical parts and gears turning. The otter gives a grim thumbs up, and then pilots the mech, flying into battle against an octopus made of marble.”” Again, this was the very first try”” https://x.com/emollick/status/2021412306291392535

Seedance: “”A documentary about how otters view Ethan Mollick’s “”Otter Test”” which judges AIs by their ability to create images of otters sitting in planes”” Again, first result.”” https://x.com/emollick/status/2021425594664353963

Seedance: “”An influencer in a TikTok video wearing an otter baseball cap showing off the weird swirling vortex they have in their living room. Cheese shoots out of the vortex every few seconds, forcing them to move around the room”” Again, very first attempt.”” https://x.com/emollick/status/2021419361039462520

The new ByteDance SeeDance 2.0 video model is VERY good. This is the very first output from my very first prompt: “”A nature documentary about an otter flying an airplane”””” https://x.com/emollick/status/2021409874832392508

GLM-5 was pre-trained on 28.5T tokens and uses DeepSeek Sparse Attention”” https://x.com/scaling01/status/2021627498451370331

Kimi Agent Swarm blog is here 🐝 https://t.co/XjPeoRVNxG Kimi can spawn a team of specialists to: – Scale output: multi-file generation (Word, Excel, PDFs, slides) – Scale research: parallel analysis of news from 2000-2025 – Scale creativity: a book in 20 writing styles”” https://x.com/Kimi_Moonshot/status/2021141949416362381

Kimi Agent Swarm: 100 Sub-Agents at Scale https://www.kimi.com/blog/agent-swarm

ByteDance’s new model sparks stock rally as China’s AI video battle escalates | South China Morning Post https://www.scmp.com/tech/article/3342932/bytedances-new-model-sparks-stock-rally-chinas-ai-video-battle-escalates?module=top_story&amp%3Bpgtype=section

Apparently SeeDance 2 is down for the time being on BytePlus but here’s a young wizard with his dragon companion. It’s a very cool video model. This video cost $0.72 USD to generate which seems reasonable. (300,000 tokens for a 15 secs at $0.0024 USD per 1000 tokens.)”” https://x.com/TomLikesRobots/status/2021504992268492814

China on a coordinated offensive — dropping Kling 3.0 and now Seedance 2.0, while we’re still debating release cycles for Sora and Veo. This is an insane leap in cinematic realism — dare I say the nano banana moment for video is here.”” https://x.com/bilawalsidhu/status/2020943565904330933

I’ve just received access to SeeDance2. Very first test and it really does live up to its hype. This is a slow and considered generation but it is very, very cool. Down with a virus right now, but I’ll enjoy testing this.”” https://x.com/TomLikesRobots/status/2021347131500667316

Interesting thread from @venturetwins on China winning the AI video race. Having spent years building in AI video, here’s how I see it: The gap isn’t about talent or compute. It’s structural. Chinese models train without IP constraints. Western ones can’t. That’s it. Seedance”” https://x.com/brivael/status/2021657223206724073

Seedance 2.0 beats Google’s Veo 3.1 on my video gen reasoning task. It generates 5 coherent tic tac toe moves in a row before breaking down. Veo only manages 1 or 2 h/t @janekm for seedancing for me :)”” https://x.com/paul_cal/status/2021657394166870507

short writeup today, not much to say until tech report. also feel uneasy about how shady seedance seems to be, either its fake or they have the worst PR team in AI”” https://x.com/swyx/status/2021500688010969216

Annoyingly, DeepSeek on the API is still V3 And I can get 128K/2024 claims repeated in previous chats, despite the absence of clues in the preceding context. They’re rolling it out very unevenly, cautiously. I guess it’s a repeat of r1-lite-preview situation. We get a taste.”” https://x.com/teortaxesTex/status/2021515356951695431

DeepSeek finally has frontier level attention. Maybe better than “”frontier””. They announced the plan to solve attention 13 months ago, in V3 paper. They’re making progress.”” https://x.com/teortaxesTex/status/2021578213420405134

DeepSeek has achieved something very Special with attention. I haven’t seen a model that’s so proactive with its context. It doesn’t just have full recall, it *inhabits* a context, feels at home there. It reminds me of Ant hype about Opus self-awareness. Or of test-time training.”” https://x.com/teortaxesTex/status/2021579901548081353

DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling | AINews https://news.smol.ai/issues/25-12-01-deepseek-32

Fun to see Deep Think’s real-world impact. Check out how it’s helping researchers catch errors in high-level mathematics research papers. As “”just”” a math undergrad, I couldn’t even dream to do any of this myself!”” https://x.com/OriolVinyalsML/status/2021982723733438725

i think we don’t realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the “”finegrain + sparse + shared expert”” deepseek moe recipe > a lot of”” https://x.com/eliebakouch/status/2021577794480644216

i think we don’t realize the impact that deepseek had on the open ecosystem, there is so much from them that you can find in almost every frontier open llm today > most of the open frontier models follow the “”finegrain + sparse + shared expert”” deepseek moe recipe > a lot of”” https://x.com/eliebakouch/status/2021577794480644216?s=46

I’m sorry but if this is DeepSeek-V4 it is unfortunately over”” https://x.com/scaling01/status/2021562929728885166

Tensor Parallelism is killing your DeepSeek-V3 throughput. Period. MLA models only have ONE KV head. If you’re using vanilla TP8, you’re just wasting 7/8 of your VRAM on redundant cache. We just shipped the solution in @sgl_project : 1. DPA (DP Attention): Zero KV redundancy.”” https://x.com/GenAI_is_real/status/2021512872027656344

Within the last few minutes, DeepSeek has been updated. Knowledge cutoff May 2025, context length 1 million tokens. This is likely V4, though it doesn’t admit to being one.”” https://x.com/teortaxesTex/status/2021511733333131311

Short post about Engram, recent paper by DeepSeek: It is essentially very similar to SCONE (link below), where authors train embeddings for a large number of n-grams (e.g. 1B common n-grams like “”Alexander the Great””). [1/2]”” https://x.com/gabriberton/status/2020612533502222459

Building the Chinese Room – by SE Gyges https://www.verysane.ai/p/building-the-chinese-room

Mistral’s revenue numbers are impressive and growing insanely fast but the bit that’s most exciting as an investor is the durability of this revenue. Mistral selects who they work with carefully to ensure they actually achieve AI transformation.”” https://x.com/paulbz/status/2021537295883481437

Built a helper repo to make MLX Distributed on Apple Silicon way less painful — and used it to run Kimi K-2.5 (658GB on disk) across a 4× Mac Studio cluster over Thunderbolt RDMA. It actually scales. Video is live 👇”” https://x.com/digitalix/status/2021290293715243261

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it’s ability to accommodate an”” https://x.com/basetenco/status/2021243980802031900

Kimi K2.5 is live on @tinkerapi:”” https://x.com/thinkymachines/status/2020927620872011940

Kimi K2.5 is now live on Qoder. It’s the #1 model on OpenRouter right now, and we’ve got it in AI Chat at 0.3x Credit (Efficient tier). Strong on implementation: SWE-bench Verified 76.8%, great for coding. @Kimi_Moonshot One early user put it well: “”Plan with the Ultimate or”” https://x.com/qoder_ai_ide/status/2020739503812387074

Mooncake originated from a research collaboration between Kimi(Moonshot AI) and Tsinghua University. It was born from the need to solve the ‘memory wall’ in serving massive-scale models like Kimi K-Series. Since open-sourcing, it has evolved into a thriving community-driven”” https://x.com/Kimi_Moonshot/status/2022109533716533612

Kimi K2.5 + Seedance 2 is a perfect workflow. One prompt = A 100MB Excel storyboard generated with images and prompts, which you can use in seedance. I just tested with Hitchcock’s Psycho iconic shower scene. My prompt: I would like to create a derivative work of the”” https://x.com/crystalsssup/status/2021149326290956353

🤖 From this week’s issue: Official blog post announcing Qwen3-Coder-Next, an 80B-parameter coding model achieving competitive performance on SWE-Bench (70.6% on Verified) while enabling 10x higher throughput for repository-level agentic workflows.”” https://x.com/dl_weekly/status/2021690941879250945

🚀 Introducing Qwen-Image-2.0 — our next-gen image generation model! 🎨 Your imagination, unleashed. ✨ Type a paragraph → get a pro slides ✨ Describe a scene → get photoreal 2K magic ✨ Add text → it just works (no more glitchy letters!) ✨ Key upgrades: ✅ Professional”” https://x.com/Alibaba_Qwen/status/2021137577311600949

A quick update — we’ve fixed a Qwen-Image 2.0 bug in Qwen Chat that impacted: • Classical Chinese poem ordering in image generation • Character consistency during image editing ✅Patch is live now! https://t.co/DWnxVxa0hY Go test it out and drop us your feedback.”” https://x.com/Alibaba_Qwen/status/2021510747671720368

Folks ask about training an RLM like a hypothetical. In the paper, we do post-train and release open-weights RLM-Qwen3-8B-v0.1 on HF. It’s a tiny proof of concept, but it was surprisingly easy to get a marked jump in capability. Maybe learning to recurse is not too hard for 8B.”” https://x.com/lateinteraction/status/2020877152854409691

https://t.co/DIetNHHMp3 Qwen3.5 architecture is out: A vision language model, hybrid SSM-Transformer using Gated DeltaNet linear attention mixed with standard attention, interleaved MRoPE, and shared+routed MoE experts.”” https://x.com/QuixiAI/status/2021109801606893837

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched | AINews https://news.smol.ai/issues/25-09-05-1t-models

Qwen https://qwen.ai/blog?id=a6f483777144685d33cd3d2af95136fcbeb57652&from=research.research-list

Qwen https://qwen.ai/blog?id=qwen-image-2.0

Qwen https://qwen.ai/blog?id=qwen-image-layered

Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT | AINews https://news.smol.ai/issues/25-08-04-qwen-image

China’s Alibaba launches AI model to power robots https://www.cnbc.com/2026/02/10/alibaba-ai-model-robotics-rynnbrain-china.html

Non-dilutive grants from local and central governments in China have enabled rapid scaling of infrastructure and R&D in AI and robotics.”” https://x.com/TheHumanoidHub/status/2021277143188242952

Imho SeeDance looks the most natural, the most human. It’s the little things: the wine moving in the glass, the facial expressions, the details. SeeDance is forcing Google and OpenAI to quickly update their models to Sora 2.5 / Veo 3.2, thus boosting performance.”” https://x.com/kimmonismus/status/2021176568563785908

Average Throughput of GLM-5 on Openrouter is 14 tps”” https://x.com/scaling01/status/2021981416452764058

Build more. Spend less. GLM-5 is now on YouWare. Landing pages, portfolios, prototypes. All handled fast, with a 200K context window. Save your premium credits for the big builds.”” https://x.com/YouWareAI/status/2021982784948936874

Congrats @Zai_org on GLM-5! Love the permissive MIT license (vs K2.5’s modified MIT). Haven’t chatted with it yet so no vibes, but from the numbers I’m not compelled to switch from @Kimi_Moonshot K2.5: • Similar evals, but GLM-5’s are at bf16 while K2.5’s are at int4 – GLM-5″” https://x.com/QuixiAI/status/2021651135615184988

Day-0 with @Zai_org: GLM-5 is live on DeepInfra 🔥 Built for long-horizon agents that plan, orchestrate, and self-correct. Serving ~100 TPS at launch and as usual the best price on the market!”” https://x.com/DeepInfra/status/2021666854088110318

GLM 5 is 2x the total parameter of GLM 4.5 + deepseek sparse attention for efficient long context this is going to be a crazy model”” https://x.com/eliebakouch/status/2020824645868630065

GLM MoE DSA”” is landing in transformers 👀”” https://x.com/xeophon/status/2020815776890909052

GLM-4.7-Flash-GGUF is now the most downloaded model on @UnslothAI.”” https://x.com/Zai_org/status/2021207517557051627

GLM-5 already available on OpenRouter (with even lower prices)”” https://x.com/scaling01/status/2021637257103651040

GLM-5 has a 200k context length and maximum output of 128k”” https://x.com/scaling01/status/2021628691357298928

GLM-5 is massive. 745B params. LETS FUCKING GOOOOO This should be fun!”” https://x.com/scaling01/status/2020840989947298156

GLM-5 Pricing $1 and $3.2 Output There is also a GLM-5 Code variant that is more expensive👀 almost 8 times cheaper than Opus”” https://x.com/scaling01/status/2021628971939418522

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It’s quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to @ActuallyIsaak and @kernelpool for the port.”” https://x.com/awnihannun/status/2022007608811696158

https://t.co/ctlyPtiB3j GLM-5 architecture is out: ~740B parameters ~50B active 78 layers, MLA attention lifted from DeepSeek V3, plus DeepSeek V3.2’s sparse attention indexer for 200k context. Basically DeepSeek V3 scale with DSA bolted on.”” https://x.com/QuixiAI/status/2021111352895393960

DeepSeek V4-lite, Minimax 2.5, GLM-5 what a bloodbath will Qwen accelerate the release of 3.5?”” https://x.com/teortaxesTex/status/2021586965594857487

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading