Open Source: AI News Week Ending 10/24/2025

Open Source: AI News Week Ending 10/24/2025

October 24, 2025

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Exploded isometric technical diagram of a wooden chess board with transparent layered construction blueprints floating above, chess pieces disassembled showing internal structure, annotation lines and measurements, collaborative maker workshop lighting, documentary photography style with shallow depth of field, warm wood tones with crisp white technical drawings overlay

Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap… We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?”” / X https://x.com/natolambert/status/1980657338726887662

It makes perfect sense to let agents understand, imitate, and learn how humans use computers from videos! We present VideoAgentTrek, which builds strong computer-use agents through video pretraining and agentic tuning. This approach has already proven effective in the training of”” / X https://x.com/huybery/status/1981728838024560669

This AI trading benchmark is interesting. Each model got $10,000 to invest. ~3 days in: ranking atm: – DeepSeek V3.1: +$2,658 – Grok 4: +$2,236 – Claude 4.5 Sonnet: +$1,911 – Qwen 3 Max: −$211 – GPT-5: −$3,139 – Gemini 2.5 Pro: −$3,719 DeepSeek beats all the other models https://x.com/Yuchenj_UW/status/1980318499185823760

Qwen Deep Research just got a major upgrade. ⚡️ It now creates not only the report, but also a live webpage 🌐 and a podcast 🎙️ – Powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible. ✨ 👉 https://x.com/Alibaba_Qwen/status/1980609551486624237

Congrats to @hwchase17 @veryboldbagel @ChesterCurme and the rest of the @LangChainAI team! Choosing LangChain, LangGraph, and LangSmith to power @bizzen_ai agents was a no-brainer. It’s been incredible to see how far your ecosystem has come”” / X https://x.com/jhhayashi/status/1980690375326278107

🔎🤖LangSmith Insights Agent Really excited to launch our first in-product agent This agent lives inside LangSmith and combs through traces, giving you insights into: 🧑‍🤝‍🧑how users are using your agent ⁉️how your agent may be messing up 🛃{your custom insight here} The problem https://x.com/hwchase17/status/1981390508841980332

Announcing Open Agent Builder – A @firecrawl_dev powered n8n-style workflow builder example app Build AI agent workflows with a visual canvas by connecting Firecrawl, LLMs, logic nodes, and MCPs, then deploy as an API. Fork the repo and build your own workflow app today 👇 https://x.com/CalebPeffer/status/1978852506286571737

LangSmith processes nearly 1 billion events and 10s of TB of data every day! In order to help users sift through all of the data they’re sending us, we launched Insights Agent! This agent automatically processes your traces in the background and gives you insights into how users”” / X https://x.com/ankush_gola11/status/1981408009097265344

🥳Announcing LangChain and LangGraph 1.0 LangChain and LangGraph 1.0 versions are now LIVE!!!! For both Python and TypeScript Some exciting highlights: – NEW DOCS!!!! – LangChain Agent: revamped and more flexible with middleware – LangGraph 1.0: we’ve been really happy with https://x.com/hwchase17/status/1981030005229670438

Insights Agent & Multi-turn Evals Agents run for a long period of time and have multiple interactions with users, tackling a wide variety of problems. Today we’re launching two new features in LangSmith, our agent engineering platform, so you can better understand your agent https://x.com/LangChainAI/status/1981390300502487370

Yesterday: $125M fundraise + v1.0 release 🚀 Today: the story behind it. In this new @LangChainAI video, we go deep on: • why we built LangGraph • what’s new in LangChain 1.0 • and how the new createAgent + middleware make agents reliable & controllable. 🎥 Watch here →”” / X https://x.com/bromann/status/1981076440780013666

Langchain released a feature that clusters agent traces by behaviour patterns. I just ran it on 500 production traces. Here’s what it actually does. It’s an LLM analyzing your LLM traces. Specifically: 1. Takes your production traces (inputs, outputs, tool calls, intermediate https://x.com/koylanai/status/1981444604869087624

We’ve heard your feedback loud and clear. Today, we’re launching 1.0 versions (in both Python and Typescript) of LangChain and LangGraph, the two most popular agent frameworks, based on what the community has been asking for. • LangGraph: Low-level agent orchestration with https://x.com/LangChainAI/status/1981030195873333269

Sharing some news…LangChain just raised a $125M Series B at a $1.25B valuation 🔥Along the way, our frameworks have grown to 85M+ monthly downloads with 35% of the F500 using one of our products. If you’re interested in joining us, we’re hiring!: https://x.com/amadaecheverria/status/1980687050174287876

LangChain has raised a $125M Series B, valuing the company at $1.25B 🦜🔗 It’s been 3 years since our first commit, and the progress since then has been humbling. Today, our open source frameworks see 85M+ downloads a month and our products are used by half of the Fortune 500.”” / X https://x.com/veryboldbagel/status/1980686379613815295

LangChain’s fundraise comes with a wave of product announcements. Would love to hear feedback from folks. On the OSS side, you can check out the 1.0 release notes here: https://x.com/chester_curme/status/1980685592544571897

🔥Today we’re excited to announce new funding for LangChain (at a $1.25B valuation) to allow us to build the platform for agent engineering. LangChain started as a single Python package 3 years ago. Since then, we’ve evolved into a comprehensive platform for agent engineering https://x.com/hwchase17/status/1980680421706006663

IVP is investing in @LangChainAI because it’s a category-defining platform for AI agents — and because of the founders building it. Over the past two years, we’ve gotten to know @hwchase17 and @ankush_gola11, who together have built something rare: a developer platform that’s https://x.com/tomloverro/status/1980714285140701362

We raised $125M to build the platform for agent engineering. Thank you to our investors (@IVP, @sequoia, @benchmark, @AmplifyPartners, @SapphireVC, @CapitalG, and more) for their belief in us, and to our customers like @Replit, @clay, @TrustVanta, @Cloudflare, @Rippling, @Cisco, https://x.com/LangChainAI/status/1980678921839603948

Big milestone for @LangChainAI 🚀 We just announced our $125M Series B and the v1.0 release of LangChain & LangGraph — marking the start of a new era for agent engineering. Proud to have contributed to the new createAgent abstraction in LangChainJS — with middleware that gives”” / X https://x.com/bromann/status/1980683275682091024

Introducing Qwen3-VL-2B and Qwen3-VL-32B! From edge to cloud, these dense powerhouses deliver ultimate performance per GPU memory, packing the full capabilities of Qwen3-VL into compact and scalable forms. 🔥 Qwen3-VL-32B outperforms GPT-5 mini & Claude 4 Sonnet across STEM, https://x.com/Alibaba_Qwen/status/1980665932625383868

🚨 WebDev Arena: Top 15 Disrupted! 4 new models have been added to the WebDev leaderboard: 🔸 #4 Claude Sonnet 4.5 Thinking 32k by @AnthropicAI 🔸 #4 GLM 4.6 (the new #1 open model) by @Zai_org 🔸 #11 Qwen3 235B A22B Instruct (and #7 open model) by @Alibaba_Qwen 🔸 #14 Claude https://x.com/arena/status/1980367208300835328

It looks like the sanboxing tool can be useful for general agent builders. So what does Anthropic do? They open-source it. https://x.com/omarsar0/status/1980408741007876183

Thanks for sharing the internal benchmarks, @rauchg ! We love to see it. 🔥”” / X https://x.com/Kimi_Moonshot/status/1980219115840385349

Why Cohere’s ex-AI research lead is betting against the scaling race | TechCrunch https://techcrunch.com/2025/10/22/why-coheres-ex-ai-research-lead-is-betting-against-the-scaling-race/

DeepSeek’s new 685B MoE model attends to only to the most relevant tokens, delivering 2–3× faster long-context inference and 6–7× cheaper processing than its V3.1 model. The new v3.2 model has MIT-licensed weights, costs $0.28/$0.028/$0.42 per 1M input/cached/output tokens via https://x.com/DeepLearningAI/status/1980846573681520824

Massively unexpected update from DeepSeek: a powerful, high-compression MoE OCR model. > In production, DeepSeek-OCR can generate 33 million pages of data per day for LLMs/VLMs using 20 nodes (x8 A100-40G). They want ALL the tokens. You’re welcome to have some too. https://x.com/teortaxesTex/status/1980160624140456370

DeepSeek released an OCR model today. Their motivation is really interesting: they want to use visual modality as an efficient compression medium for textual information, and use this to solve long-context challenges in LLMs. Of course, they are using it to get more training https://x.com/iScienceLuvr/status/1980247935700066468

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Looks like this paper is also exploring the direction DeepSeek is interested in: representing text more efficiently as images, observing almost half reduction in number of tokens https://x.com/iScienceLuvr/status/1980942325573648703

> by storing the data representation natively as image tiles This must be obvious but just to clarify: DeepSeek does not propose to store *screenshots* of your chat logs. Pixel representation can be ephemeral; what is stored is still tokens, just not *language* tokens. https://x.com/teortaxesTex/status/1980453820632297900

Karpathy is undoubtedly vision pilled. And thanks to this casual DeepSeek drop — so will you.”” / X https://x.com/bilawalsidhu/status/1980598830916939880

DeepSeek https://github.com/deepseek-ai/

@mervenoyann The good perf of DeepSeek models matches with what we observe on PrediBench! https://x.com/AymericRoucher/status/1980196484617523445

Again, I will reiterate this: DeepSeek was literally built by chinas top quant firm, and chinas TOP quants.”” / X https://x.com/hamptonism/status/1980182896049811780

After DeepSeek-V3.2-Exp added TileLang & CUDA ops, many asked: what exactly is TileLang? 🤔 In his post “”TileLang: 80 lines of Python kernel code to reach 95% of FlashMLA’s performance””, developer & Zhihu contributor ryume gives a full breakdown of this new AI programming https://x.com/ZhihuFrontier/status/1980170674112188440

GLM-4.6 providers overview: we are benchmarking API endpoints offered by Baseten, GMI, Parasail, Novita, Deepinfra GLM-4.6 (Reasoning) from @Zai_org is one of the most intelligent open weights models, with intelligence close to GPT-OSS-120b (high), DeepSeek V3.2 Exp (Reasoning) https://x.com/ArtificialAnlys/status/1980777360724226282

For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called “”Language Modelling with Pixels””. They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches https://x.com/NielsRogge/status/1980559120760791125

We’re seeing a lot of usage around DeepSeek’s new OCR model. Alex packaged it so you can deploy and test it yourself – prompts and sample images included.”” / X https://x.com/basetenco/status/1980924381217104338

DeepSeek-OCR looks impressive, but its core idea is not new. Input “Text” as “Image” — already explored by: LANGUAGE MODELING WITH PIXELS (Phillip et al., ICLR 2023) CLIPPO: Image-and-Language Understanding from Pixels Only (Michael et al. CVPR 2023) Pix2Struct: Screenshot https://x.com/awinyimgprocess/status/1980506449706119642

A more serious thread on the DeepSeek-OCR hype / serious misinterpretation going on. 1. On token reduction via representing text in images, researchers from Cambridge have previously shown that 500x prompt token compression is possible (ACL’25, Li, Su, and Collier). Without https://x.com/Kangwook_Lee/status/1980709454522744902

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about. In short, they explore how vision https://x.com/rasbt/status/1980642191950090585

I quite like the new DeepSeek-OCR paper. It’s a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn’t matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language”” / X https://x.com/karpathy/status/1980397031542989305

🚨 DeepSeek just did something wild. They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels. Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That https://x.com/godofprompt/status/1980233080213590326

Letsss gooo! DeepSeek just released a 3B OCR model on Hugging Face 🔥 Optimised to be token efficient AND scale ~200K+ pages/day on A100-40G Same arch as DeepSeek VL2 Use it with Transformers, vLLM and more 🤗 https://x.com/reach_vb/status/1980170192392270227

a bunch of OCR models released in past few weeks: ~ deepseek-ocr-3b ~ olmo-ocr-2-7b ~ chandra-ocr-8b ~ nanonets-ocr2-3b ~ paddleocr-vl-0.9B ~ qwen3-vl-dense/moe (general vlm) ~ dots.ocr-3b Will be dropping a detailed comparison soon”” / X https://x.com/HarveenChadha/status/1981055277408669934

NEW DeepSeek OCR model that outperforms dots ocr while prefilling 3x less tokens https://x.com/casper_hansen_/status/1980166248878203093

DeepSeek-OCR has some weird architectural choices for the LLM decoder: DeepSeek3B-MoE-A570M -> uses MHA, no MLA (not even GQA?) -> 2 shared experts (like DeepSeek V2, but V3 only has 1) -> quite low sparsity, activation ratio is 12.5%. For V3 it’s 3.52%, for V2 it’s 5% -> not https://x.com/eliebakouch/status/1980193125202083951

I think Glyph coming out on the same day a) corroborates the results of DeepSeek OCR b) confirms the “they had it lying around for a while” suspicion. Charitably, they learned of Zhipu’s project retracing their steps and sped up the release. Other possibilities are obvious too.”” / X https://x.com/teortaxesTex/status/1980642000006451348

deepseek-ai/DeepSeek-OCR: Contexts Optical Compression https://github.com/deepseek-ai/DeepSeek-OCR

what happened this week with OCR and VLMs? * deepseek-ocr * chandra-ocr * nanonets-ocr2 * paddleocr-vl * qwen3-vl (2B, 32B, Instruct and Thinking) * dots.ocr * olmOCR 2 (based on Qwen2.5-VL) * LightOnOCR (smallies) top 5 trending models on @huggingface are still OCR/VLM! https://x.com/MaziyarPanahi/status/1981421331053760775

DeepSeek-OCR Contexts Optical Compression https://x.com/_akhaliq/status/1980260630780162505

DeepSeek OCR dropped … but honestly, Glyph [1], released the same day, showed something more interesting: 3–4× context compression and infilling cost reduction, no performance hit on long-context QA and summarization, which is much less trivial than OCR in many cases. If that https://x.com/arankomatsuzaki/status/1980722682246398069

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping https://x.com/vllm_project/status/1980235518706401405

🔥The wait is over. The community has been asking for pruned GLM-4.6 models, and they’re finally here! 🔥 We’re releasing REAP-pruned GLM-4.6 checkpoints at 25%, 30%, and 40% compression, now available on @huggingface for the community to explore and experiment with. https://x.com/vithursant19/status/1981476324045967785

Now available on @huggingface, the new dataset of 1.5 million task scenarios, field-tested and open-sourced by researchers from @IBM and the @UW, is designed to improve how agents interact with the world and get things done: https://x.com/IBMResearch/status/1981066891062817274

Build and serve LlamaAgents locally with llamactl – our CLI tool for creating, testing, and shipping LlamaAgents – built on top of LlamaIndex Workflows Get your workflow-driven agents running in minutes with these powerful features: 🚀 Initialize projects from templates with https://x.com/llama_index/status/1980673952033976824

Excited to release our new open-source collaboration with Meta: OpenEnv Pushing for better research/open-source usage practices on agents (LLM/VLM/code). We want to bring reproducible practices in frontier agentic research (like the recent Code World Model) with a”” / X https://x.com/Thom_Wolf/status/1981396028117901401

Introducing Mistral AI Studio, the production AI platform. Mistral AI Studio enables builders to move from AI experimentation to production with a robust runtime for agents and deep observability across the AI lifecycle. More on our blog: https://x.com/MistralAI/status/1981752578951233989

olmOCR – Open-Source OCR for Accurate Document Conversion https://olmocr.allen.ai/blog

Deploy your favorite OCR models with few-clicks directly from Hugging Face 🔥 📷we’ve added the latest bleeding edge OCR models to the Inference Endpoints catalog to make it easy for you to get started! links 👇 https://x.com/ErikKaum/status/1980965155145216336

there’s a new OlmOCR model that outperforms other OCR models, with Apache 2.0 license 🔥 and it costs only $178 to parse million pages 🤯”” / X https://x.com/mervenoyann/status/1981040748133826918

I’m excited to announce that Chandra OCR is open source! – Full layout information – Extracts and captions images and diagrams – Strong handwriting, form, table support – Works with transformers and vLLM https://x.com/VikParuchuri/status/1980667137606971423

Hugging Face just unveiled FineVision: The largest & cleanest open dataset for VLMs A meticulously curated corpus of 24 million samples, unifying 200+ sources into 185 subsets via a semi-automated, human-in-the-loop pipeline. Outperforms existing open mixtures, accelerating https://x.com/HuggingPapers/status/1981093262912819418

We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs! @NVIDIAAIDev Let’s go spark! https://x.com/ollama/status/1981486870963114121

We believe open source models should work as well as proprietary ones in Cline. Here’s what we did to make it happen: SYSTEM PROMPT: Reduced GLM-4.6’s prompt from 56,499 to 24,111 characters. A 57% reduction. Faster responses, lower costs, higher success rates. PROVIDER”” / X https://x.com/cline/status/1981420111815987494

💡 vLLM @ Open Source AI Week！ 1⃣ Wednesday, Oct 23 & Thursday, Oct 24: vLLM @ Pytorch Conference 2025 🚀 Explore vLLM at PyTorch Conference 2025! 📅 Sessions to catch: 1. Easy, Fast, Cheap LLM Serving for Everyone – Simon Mo, Room 2004/2006 2. Open Source Post-Training Stack: https://x.com/vllm_project/status/1980622348903674022

Excited to share OpenEnv: frontier-grade RL environments for the open-source community 🔥! https://x.com/_lewtun/status/1981380372748521929

I enjoyed speaking at #PyTorchCon today. Wanted to share one slide from my talk about open source AI infra. This is about how Ray and vLLM work together. LLM inference is growing more and more complex, and doing a good job with LLM inference means working across layers and https://x.com/robertnishihara/status/1981112722361372924

Turn natural language into SQL queries using open-source text2SQL models and our Workflows orchestration system. This comprehensive demo by @tuanacelik shows how to build sophisticated text-to-SQL applications that can understand complex queries and generate accurate SQL: 🔍 https://x.com/llama_index/status/1980309057287446532

Have you noticed different performance between open model inference providers? You can now run evals across providers with the @huggingface inference providers integration in InspectAI. https://x.com/dvilasuero/status/1981688436735271283

Introducing LTX-2: the most complete open-source AI creative engine. – Synchronized audio and video generation – Native 4K fidelity, up to 50 fps and 10 s+ sequences – API-first design for seamless integration into creative pipelines – Runs efficiently on consumer GPUs – Fully https://x.com/ltx_model/status/1981346235194683497

Open source coding benchmarks are operating in a different reality. They don’t test real world tasks and expect users to come prepared with a detailed page-long spec of exactly what they want to build or fix. But real people don’t use AI this way. They write vague prompts like https://x.com/pashmerepat/status/1981431374386233840

Very cool open-source work from @PyTorch on reinforcement learning environments (we helped a tiny bit)! Feels like early days on the topic with already exciting work from @PrimeIntellect @MechanizeWork @mercor_ai for example but exciting to make this topic as open-source and https://x.com/ClementDelangue/status/1981737560566005950

I expect GLM-4.6-Air to make an improvement similar to Qwen-3 to Q3-2507 update, or maybe even the latest Qwen round. Will be the default model between 30B and 200B.”” / X https://x.com/teortaxesTex/status/1981702360981557624

Choose the “”:exacto”” version of open-source models in Cline automatically route to the best inference provider for models like GLM-4.6, Qwen3-Coder, and Kimi-K2. Provider quality varies wildly, meaning the same model can yield completely different results at different endpoints. https://x.com/cline/status/1981370535176286355

Over the last 24 hours, I have finetuned three Qwen3-VL models (2B, 4B, and 8B) on the CATmuS dataset on @huggingface . The first version of the models are now available on the Small Models for GLAM organization with @vanstriendaniel ! (Link below). These are designed to work https://x.com/wjb_mattingly/status/1981736776076026044

Qwen3-VL-2B-Instruct app is out on Hugging Face https://x.com/_akhaliq/status/1980690335220351063

we just updated the model comparison on our blog for you 🫡 added Chandra, OlmOCR-2, Qwen3-VL and their averaged OlmOCR score! https://x.com/mervenoyann/status/1981396054634615280

Kimi K2 is up to 5x faster and 50% more accurate ：）”” / X https://x.com/crystalsssup/status/1980147163629047854

OpenEnvs for Reinforcement Learning! 🙏 We are launching a universal RL Environment interface today, teaming up with @huggingface and @UnslothAI Let’s take a trip down memory lane: It’s 2016, you read some papers. RL looks promising. But the reality? Cartpole is best we https://x.com/bhutanisanyam1/status/1981377720157351938

14B, 11FPS on B200, Real-time -Apache 2.0 licensed on Hugging Face 📹”” / X https://x.com/reach_vb/status/1980376352726610342

today we’re open-sourcing Krea Realtime. this 14B autoregressive model is 10x larger than any open-source equivalent, and it can generate long-form videos at 11 fps on a single B200. weights and technical report below 👇 https://x.com/krea_ai/status/1980358158376988747

krea/krea-realtime-video · Hugging Face https://huggingface.co/krea/krea-realtime-video

Qwen just released Qwen3-VL on Hugging Face The most powerful vision-language model in the Qwen series, with comprehensive upgrades across text understanding, visual reasoning, and long context video analysis. From GUI operations to 1M context. https://x.com/HuggingPapers/status/1980809413045940553

Open-o3 Video Grounded Video Reasoning with Explicit Spatio-Temporal Evidence https://x.com/_akhaliq/status/1981564465897509333