Open Source: AI News Week Ending 04/03/2026

Open Source: AI News Week Ending 04/03/2026

April 3, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted glass perfume bottle with warm amber liquid, crystal stopper, pure white background, soft shadow, and clean white label, but replace the label text with ‘Open Source’ in matching black serif typography. Add a delicate sterling silver chain draped around the bottle neck with a tiny dainty open padlock pendant hanging from it, the lock’s shackle lifted in unlocked position, rendered in refined high-fashion jewelry style with precise metalwork detail.

Useful guide for getting started with Hermes Agent:::
https://x.com/Teknium/status/2039102514508058675

LiteParse is our open-source document parser that provides high-quality spatial text parsing with bounding boxes. It can parse hundreds of pages of table-heavy documents in seconds – and give you bounding boxes over all the text elements! 🎁 This means that any agent automation
https://x.com/jerryjliu0/status/2039730277786980833

33 hours of audio transcribed in 12 minutes! @CohereLabs just released Cohere Transcribe – 2B open-source ASR, 66 eps of 1940s CBS Suspense from @internetarchive on A100 via @huggingface Jobs + Buckets mount 161x realtime! Script + all transcripts are public
https://x.com/vanstriendaniel/status/2037548103272632497

Very hyped by the new Cohere Transcribe model 🌍 Works surprisingly well on bad quality audio when the mic doesn’t cooperate. 2B params, 14 supported languages and it’s Apache 2.0. try the official Hugging Face demo ⬇️
https://x.com/victormustar/status/2037572662659104976

. @googlegemma have open sourced the perfect model for local open source agents. Gemma 4 comes in all the sizes we need for mobile, local, and code. This is how I’ll be switching my @thdxr opencode agent over. Let’s go local agents.
https://x.com/ben_burtenshaw/status/2039740590091362749

🎉 Gemma 4 is officially available on vLLM! Byte-for-byte, these are the most capable open models for advanced reasoning and agentic workflows. Key features include: – Native Multimodal Support: Full vision and audio capabilities with up to a 256K context window. – Broad
https://x.com/vllm_project/status/2039762998563418385

A 12-month time difference between Gemma 3 27b and Gemma 4 31b. The jump is absolutely enormous. Just look at the evaluations between the two models. GPQA doubled, AIME 2026 went from ~20% to ~90%, and so on. Crazy.
https://x.com/kimmonismus/status/2039759264680747219?s=20

A Visual Guide to Gemma 4 With almost 40 (!) custom visuals, explore the new models from Google DeepMind. We explore various techniques, ranging from Mixture of Experts and the Vision Encoder all the way up to Per-Layer Embeddings and the Audio Encoder. Link below 👇
https://x.com/MaartenGr/status/2040099556948390075

Gemma 4 — Google DeepMind
https://deepmind.google/models/gemma/gemma-4/

Gemma 4 31B (Reasoning) is very token efficient, using ~1.2M tokens on the GPQA Diamond evaluation, fewer than peers models such as Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M)
https://x.com/ArtificialAnlys/status/2039752015811866652

Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the
https://x.com/Prince_Canuma/status/2039840313074753896

Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!)
https://x.com/demishassabis/status/2040067244349063326

Gemma 4: Our most capable open models to date
https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

Gemma-4-31B is now live in Text Arena – ranking #3 among open models (#27 overall), matching much larger models at 10× smaller scale! A significant jump from Gemma-3-27B (+87 pts). Highlights: – #3 open (#27 overall), on par with the best open models Kimi-K2.5, Qwen-3.5-397b –
https://x.com/arena/status/2039739427715735645

Getting Started with Gemma 4 in AI Studio
https://x.com/GoogleAIStudio/status/2040090067709075732

Google just open-sourced Gemma 4. Unprecedented performance for advanced reasoning and agentic workflows, and big leap in efficiency on a parameter basis. Use it now in KerasHub. I recommend the JAX backend – best performance!
https://x.com/fchollet/status/2039845249334510016

Google just re-entered the game 🔥🔥 They want to take the crown 👑 back from Chinese open source AI. And… Gemma 4 is FINALLY Apache 2.0 aka real-open-source-licensed. From what I’ve seen it’s going to be a pretty significant model. But give it a try yourself today: brew
https://x.com/ClementDelangue/status/2039941213244072173

got Gemma 4 up and running at 34 tokens per second this is the 26B-A4B model, running on my mac mini m4 with 16GB ram next time i hit my claude session limits i’ll have this fast free local AI as a backup :]
https://x.com/measure_plan/status/2040069272613834847

Got Gemma-4-26B-A4 MoE running on iPhone w/Flash SSD in Swift MLX. Still pretty slow, I expect 10+ t/s once optimized properly for Swift.
https://x.com/anemll/status/2040126326708031969

Introducing a Visual Guide to Gemma 4 👀 An in-depth, architectural deep dive of the Gemma 4 family of models. From Per-Layer Embeddings to the vision and audio encoders. Take a look!
https://x.com/osanseviero/status/2040105484061954349

Let’s look at how the open model Gemma has progressed across its last three versions. – Gemma 4 ranks 100 places above Gemma 3 – Gemma 3 ranks 87 above Gemma 2 All three models from @GoogleDeepMind are roughly the same size (31B, 27B, 27B), and these gains came only 9 and 13
https://x.com/arena/status/2039848959301361716

Lets go: Running a full AI assistant locally on a MacBook Air M4 with 16GB, completely free, open source, no API keys needed. Atomic Bot makes it really simple: install, pick Gemma 4, and you have an always-on AI agent running on your machine. No cloud. No subscription. No data
https://x.com/kimmonismus/status/2039989730901623049

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
https://x.com/GoogleDeepMind/status/2039735446628925907

NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇
https://x.com/xenovacom/status/2039741226337935430

To explain why I consider Gemma 4 a bigger release than most people realize. This is a big deal because models like Gemma 4 E4B can run directly on devices, bringing powerful AI (even a 2B model ~60% on MMLU Pro) to phones, laptops, and edge systems without relying on the cloud,
https://x.com/kimmonismus/status/2039978863644537048

Today, we’re launching Gemma 4, our most intelligent open models to date. Built with the same breakthrough technology as Gemini 3, Gemma 4 brings advanced reasoning to your personal hardware and devices. Here’s what Gemma 4 unlocks for developers: — Intelligence-per-parameter:
https://x.com/GoogleAI/status/2039735543068504476

We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially
https://x.com/Google/status/2039736220834480233

You can run Gemma 4 100% locally in your browser thanks to HF transformers.js. That means 100% private and 100% free! @xenovacom created a demo for it here:
https://x.com/ClementDelangue/status/2039782910996148508

run OpenClaw, Hermes Agent and Pi with Gemma 4 with few lines of change 🔥
https://x.com/mervenoyann/status/2039788257815261400

So happy to see Google release Gemma 4 today in apache 2.0 that gives you frontier capabilities locally. You can use it right away in all your favorite open agent platforms like openclaw, opencode, pi, Hermes by asking it to change your model to local gemma 4 with
https://x.com/ClementDelangue/status/2039740419899056152

Been really cool to see the traction of @NousResearch Hermes Agent, the open source agent that grows with you! Hermes Agent is open-source and remembers what it learns and gets more capable over time, with a multi-level memory system and persistent dedicated machine access.
https://x.com/ClementDelangue/status/2037634211973140898

I just had a very magical moment with the Hermes Agent by @NousResearch . My Hermes agent messaged my business partner’s Hermes agent, and they established a secure connection. They made a few rounds back-and-forth, introduced themselves, and updated notes on the current
https://x.com/fancylancer3991/status/2037579517389144399

Going to install Hermes today Never did get around to OpenClaw. Having read what I’ve seen about Hermes, kind of glad I waited. Excited to give it a go
https://x.com/soundslikecanoe/status/2038611090704113931

Openclaw took me weeks to deploy and get going. Something still breaks daily. I still love it. Hermes took 15 min to setup and get running, fully local, Discord, local model. Crazy… Keep tinkering. Stay agnostic.
https://x.com/charliehinojosa/status/2039384870091465202

Switched to Hermes over OpenClaw a few weeks back and it’s been largely smooth sailing and a blissful experience For those still using OpenClaw, is it a lot more smooth sailing these days too?
https://x.com/Zeneca/status/2039836468928233875

You can switch to Hermes in 2 minutes. They have an import function from OpenClaw. Smart @NousResearch.
https://x.com/AntoineRSX/status/2039017227270156395

Open Models have crossed a threshold
https://blog.langchain.com/open-models-have-crossed-a-threshold/

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’.
https://x.com/Alibaba_Qwen/status/2038636335272194241

Demo2：Audio-Visual Vibe Coding
https://x.com/Alibaba_Qwen/status/2038637124619231467

Here’s another demo of Audio-Visual Vibe Coding~
https://x.com/Alibaba_Qwen/status/2038641496455557565

Qwen
https://qwen.ai/blog?id=qwen3.5-omni

Qwen
https://qwen.ai/blog?id=qwen3.6

Chinese OpenSource models are gonna mug Anthropic & OpenAI like they never existed before The coding gap between open and closed-source is practically gone GLM-5.1 gives the almost the same comparable coding performance that goes toe-to-toe with Claude Opus, but a roughly 10x
https://x.com/XFreeze/status/2037695882301436412

Cohere has released Cohere Transcribe: an open weights model achieving 4.7% on AA-WER, based on 3 datasets including our proprietary AA-AgentTalk dataset The 2B parameter model is based on a conformer encoder-decoder architecture. It was trained from scratch on 14 languages
https://x.com/ArtificialAnlys/status/2038678855213568031

Cycle your keys and oauths for the same provider when one runs out – now in Hermes Agent latest. `hermes update` to access!
https://x.com/Teknium/status/2039096442313396514

Deeper dive into some of the updates in v0.7 Memory: We have begun transitioning each of the systems in Hermes Agent to work through defined interfaces so that the core code is more maintainable, and more providers for everything can be supported. We started with memory: Now
https://x.com/Teknium/status/2040151297991770435

Hermes Agent now supports @plastic_lab’s Honcho, @mem0ai, @openvikingai, @Vectorizeio’s Hindsight, @retaindb, and @ByteroverDev memory systems! Try them now with `hermes update` then `hermes memory setup` We have rehauled our memory system to be much more maintainable and
https://x.com/Teknium/status/2039912975444926885

installed the icarus plugin on my Hermes agent. it picked up all 6 tools automatically. The agent works across slack, telegram, discord. every session gets captured. after a month of running you have hundreds of real decisions logged. then you tell the agent “”train yourself.””
https://x.com/IcarusHermes/status/2038524251355934872

It’s FINALLY HERE! Multi Agent Profiles so you can have as many independent bots with their own memory, gateway connections, skills, chat history, everything! To use: Run `hermes update` and look for multi agent profiles User Guide:
https://t.co/i0R8puqJ6k Reference:
https://x.com/Teknium/status/2038694680549077059

Our biggest day EVER with Hermes Agent, we’re now #5 biggest AI App on OpenRouter metrics! What do you want to see in the next update?
https://x.com/Teknium/status/2039788883312087231

Your Hermes agent writes things every session — research, skills, decisions, logs. After a few weeks, you’ve got hundreds of files sitting in the working directory. But the agent can’t read them all every session. It doesn’t know which ones matter for this question. So it either
https://x.com/jphorism/status/2039822829412405671

Excited about our new paper: AI Agent Traps AI agents inherit every vulnerability of the LLMs they’re built on – but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself. The web pages, emails, APIs, and
https://x.com/FranklinMatija/status/2039001719007330530

It’s time for open-source agent tools to rely primarily on open-source models, instead of closed-source APIs that send all your data to the cloud and ultimately will get hacked and/or shut down
https://x.com/ClementDelangue/status/2038552830638755962

Are open source models catching up to proprietary models? We’ve looked back at 3 years of Arena’s data to show how the race has evolved. For comparison, we’ve taken the top 20% of the models and uncovered the following: – Before mid 2024: The gap was between 100-150 points – In
https://x.com/arena/status/2037584085997216100

Mistral secures $830 million in debt financing to fund AI data center
https://www.cnbc.com/2026/03/30/mistral-ai-paris-data-center-cluster-debt-financing.html

China’s DeepSeek suffers rare outage lasting several hours

China’s DeepSeek suffers rare outage lasting several hours

.@GoogleDeepMind Gemma 4 is here with state-of-the-art models targeting edge and workstations. Requires Ollama 0.20+ that is rolling out. 4 models: 4B Effective (E4B) ollama run gemma4:e4b 2B Effective (E2B) ollama run gemma4:e2b 26B (4B active MoE) ollama run gemma4:26b
https://x.com/ollama/status/2039738348647108680

.@UnslothAI supports @GoogleGemma 4 models, optimized for RTX GPUs. 🦥 Run & fine-tune locally in Unsloth Studio.
https://x.com/NVIDIA_AI_PC/status/2040096993800761579

Axolotl support for Gemma 4 is in v0.16.1 is released! Finetune @GoogleAIStudio Gemma4 26B-A4B on your own 5090 using our optimized fused MoE+LoRA kernels!
https://x.com/winglian/status/2039823559363629432

Deploy Gemma4 31B and 26B-A4B with one click on Hugging Face Inference Endpoints 🔥👇
https://x.com/ErikKaum/status/2040008281796513939

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use – happy building!
https://x.com/demishassabis/status/2039736628659269901

Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4
https://x.com/rasbt/status/2039780905619705902

future is local 🔥 Google DeepMind just released Gemma 4: local frontier in many sizes, all modalities with free license 🤯 we ship Gemma 4 in transformers, llama.cpp, transformers.js and more for your convenience 🫡 plug-and-play with your agents 🙌🏻 read our blog ⤵️
https://x.com/mervenoyann/status/2039739097611215344

Gemma
https://x.com/OfficialLoganK/status/2039486016751366431

Gemma 4 26B MoE (4B active) on a single RTX 4090: – 162 t/s decode – 8,400 t/s prefill – Full 262K native context — 19.5 GB VRAM – Only 10 Elo below the 31B dense Q8_0 on dual 4090+3090: 9,024 t/s prefill at 10K. 2,537 t/s at full 262K — that’s a novel in about 100
https://x.com/basecampbernie/status/2039847254534852783

Gemma 4 architecture analysis thread Just as Gemma3n, this thing has a galaxybrained architecture, very much not a standard transformer
https://x.com/norpadon/status/2039740827975500251

Gemma 4 by @GoogleDeepMind debuts at 3rd and 6th on the open source leaderboard, making it the #1 ranked US open source model. By total parameter count, Gemma 4 31B is 24× smaller than GLM-5 and 34× smaller than Kimi-K2.5-Thinking, delivering comparable performance at a
https://x.com/arena/status/2039782449648214247

Gemma 4 is here! The best open-source model you can run on your machine. Day-0 support in a llama.cpp. Check it out!
https://x.com/ggerganov/status/2039744468899811419

Gemma 4 is live on Baseten and available to all customers on day 0 via the Baseten model library. All models in the Gemma 4 family are multimodal, supporting text and image inputs with text output. Key capabilities include: -> Advanced reasoning and thinking -> Coding and
https://x.com/baseten/status/2039751071284015393

Gemma4 is amazing. You’ll read that everywhere. Let’s focus on what is HUGE here: the revenge of dense models…. Throw away your b200, not needed anymore, throw away the millions of lines of code we had to write to make MOEs faster, training stable etc… throw away your
https://x.com/art_zucker/status/2039740402517893361

Google Deep Mind’s impressive fully-open Gemma 4 is live day-zero on Modular Cloud. Modular provides the fastest performance on NVIDIA Blackwell and AMD MI355X, thanks to MAX and Mojo🔥. The team took this impressive new model to production inference in days.🚀
https://x.com/clattner_llvm/status/2039738590213910558

google gemma 4 architecture is very interesting and every model has some subtle differences, here is a recap: > per layer embedding only on the small variant > no attention scale (usually you divide qk^T by sqrt(d), they don’t) > they do QK norm + V norm as well > they share
https://x.com/eliebakouch/status/2039751171556954531

Google has released Gemma 4, a new family of multimodal open-weight models including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 31B and Gemma 4 26B A4B @GoogleDeepMind’s new Gemma 4 family introduces four multimodal models supporting text, image, and video inputs. We evaluated Gemma 4
https://x.com/ArtificialAnlys/status/2039752013249212600

Google releases Gemma 4. ✨ Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B. The multimodal reasoning models are under Apache 2.0. Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB. GGUFs:
https://t.co/fpX21yWbge Guide:
https://x.com/UnslothAI/status/2039739190536286313

I have to give credit to Google for Apache 2.0 on Gemma 4! This is huge!
https://x.com/QuixiAI/status/2039862230452252926

Intel is partnering with @GoogleAI to deliver fully functional #Gemma4 models on Intel hardware from day zero–across Intel Xeon CPUs, Intel Xe GPUs, and Intel Core Ultra processors, with support across open frameworks including @vllm_project and @huggingface. This means
https://x.com/intelnews/status/2040106767258906707

Just do this: brew install llama.cpp –HEAD Then; llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
https://x.com/julien_c/status/2039746054355067002

Let me demonstrate the true power of llama.cpp: – Running on Mac Studio M2 Ultra (3 years old) – Gemma 4 26B A4B Q8_0 (full quality) – Built-in WebUI (ships with llama.cpp) – MCP support out of the box (web-search, HF, github, etc.) – Prompt speculative decoding The result:
https://x.com/ggerganov/status/2039752638384709661

Say hello to Gemma 4 from @GoogleDeepMind 🚀🔥 💎 Comes in 4 sizes: E2B, E4B, 26B A4B, 31B 💎 Supports vision and reasoning 💎 Apache 2.0 💎 Available now in LM Studio
https://x.com/lmstudio/status/2039738625525502426

Son lead the development on HF/llama.cpp side for adding support for the new Gemma 4 models. As always, he did an outstanding job throughout the collaboration with the Google DeepMind team. Day-0 support is possible thanks to his hard work!
https://x.com/ggerganov/status/2039943099284140286

Thanks for following us! We’re excited to see what you all build with Gemma 4! In case you missed it, you can find all our checkpoints, with an Apache 2.0 License, on Hugging Face:
https://x.com/googlegemma/status/2040107948010242075

thinking about google’s gemma 4 and what it means a few months ago running something this capable locally meant serious hardware and serious tradeoffs on quality now it runs on your laptop, works offline on your phone (!!!), speaks 140 languages natively, 256k context window,
https://x.com/gregisenberg/status/2039853864082424198

Today we’re releasing Gemma 4, our new family of open foundation models, built on the same research and technology as our Gemini 3 series. These models set a new standard for open intelligence, offering SOTA reasoning capabilities from edge-scale (2B and 4B w/ vision/audio) up
https://x.com/JeffDean/status/2039748604232122707

Two years ago, we released Gemma, Google DeepMind family of open models. Today, I’m thrilled to share a new milestone: Gemma 400M downloads and 100,000 variants! Thank you to every developer, partner, and contributor. We can’t wait to see what you build next!👀
https://x.com/osanseviero/status/2039120000095547722

What you need to know about @googlegemma 4: 4️⃣ 4 sizes (E2B, E4B, 26B4A, 31B) 🪟 Up to 256K context window 🛠️ Native function-calling, structured JSON output 👁️ + audio on edge models (E2B/E4B) 🌍 Trained on 140+ languages 🏆 31B ranks #3 open model on Arena AI 🪪 Apache 2.0
https://x.com/_philschmid/status/2039736207676965264

Yowza! @ollama is on it with new Gemma 4 models
https://x.com/MichaelGannotti/status/2039903041642508541

Gemma 4 31B shifts the Pareto frontier, scoring +30 Arena points above similarly priced models like DeepSeek 3.2. Its position on the Pareto frontier is based on early pricing indicators from third parties.
https://x.com/arena/status/2040128319719670101

impressive, very nice. now let’s compare a 31b dense to a 31b active 670b total instead. flop for flop
https://x.com/stochasticchasm/status/2039912148676264334

MoE models differ from the likes of DeepSeek and Qwen: instead of using shared experts in parallel to the routed ones, Gemma adds MoE blocks as separate layers in addition to the normal MLP blocks. So the architecture is Attention -> MLP -> MoE
https://x.com/norpadon/status/2039750841754697767

Nemotron Super / Ultra Arcee Trinity Large (soon) Gemma 4 (eventually) Reflection’s first models (maybe) GPT OSS 2? (maybe) Thinky? Other neolabs? Things looking up for open models built in the US in 2026. We had 0 for a bit there.
https://x.com/natolambert/status/2039499358325129530

Storage Buckets for Spaces You can now mount HF Buckets as persistent storage volumes directly in your Spaces. In Space settings, the new “”Storage Buckets”” section lets you create or select a bucket, set the mount path and access mode. You can also attach a bucket when creating
https://x.com/_akhaliq/status/2039404288082894912

.@MistralAI’s new Voxtral TTS generates expressive, multilingual speech from just ~3 seconds of reference audio It solves one of the hardest problems in speech, separating what you say from how you sound ➡️ Voxtral factorizes speech into two parts: • semantic tokens → the
https://x.com/TheTuringPost/status/2038285318827413800

Real-life conversation AI, powered by Voxtral 😎
https://x.com/sophiamyang/status/2037523809914241069

Long context windows are now available for select models on Tinker! – 128k tokens for Kimi K2.5 and GPT-OSS-120B – 256k for Nemotron 3 Super 120B and Qwen3.5 397B. For more details and pricing, see our full model lineup:
https://x.com/tinkerapi/status/2039424320393621649

Hermes cron job for scanning for new major vulnerabilities and checking + notifying and even resolving those vulnerabilities if existing locally might be a pretty great use case!
https://x.com/Teknium/status/2039022907020689898

Must-read AI research of the week: ▪️ Learning to Commit: Generating Organic Pull Requests via Online Repository Memory ▪️ Effective Strategies for Asynchronous Software Engineering Agents ▪️ Composer 2 ▪️ From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow
https://x.com/TheTuringPost/status/2038763668079550900

Repo is offically live. Thank you for the support and encouragement. I hope everyone likes it. Please send feedback when you can. Thank you. @NousResearch @Teknium
https://x.com/aijoey/status/2039108098174906514

Thanks for using Hermes Agent @OrdinaryGamers!
https://x.com/NousResearch/status/2039402523711140094

The Hermes Agent update you’ve been waiting for is here.
https://x.com/NousResearch/status/2038688578201346513

Wouldn’t have known how good hermes is if not for my dead laptop. Just standard setup. No fancy plug-ins or skills. Running GLM5. Whatever it needed to learn, it did by chatting with me. @Teknium and team, thank you so much for working on this. It’s so damn good. F’ing awesome
https://x.com/AnomalistG/status/2039969500968501748

We have integrated @huggingface as a first-class inference provider in Hermes Agent. When you select Hugging Face in the model picker it now shows 28 curated models organized by use case, with a custom option for the 100+ other models they serve.
https://x.com/NousResearch/status/2037654827929338324

Holo3 is here 🚀. Today, we’re launching Holo3: our new series of frontier computer-use models. 78.9% on OSWorld-Verified. That puts us ahead of GPT-5.4 and Opus 4.6, at one-tenth of the cost. Weights on Hugging Face. API is live. Test it now! #Holo3 #OpenSource #ComputerUse
https://x.com/hcompany_ai/status/2039021096649805937

@NousResearch @Teknium Hermes has been running 20 mins straight on trying to solve something. Openclaw would have lost its way by now. Second time tonight it’s been running long trying to solve things. This is magic. Hermes also fixed my Openclaw agent which now runs better. Wow
https://x.com/erick_lindberg_/status/2039897087878275580

Is it just me or does codex 5.4 give better answers and results when using Hermes-agent versus OpenClaw? I mean not sort of kind of, but literally like you are using a completely better model? @Teknium what’s the secret sauce? I spent a lot of time on OpenClaw getting it “just
https://x.com/alexcovo_eth/status/2037589212648665273

it’s pretty obvious at this point. Hermes Agent > OpenClaw
https://x.com/VadimStrizheus/status/2039523211369762875

며칠전부터 자꾸 Hermes 에이전트에 신경이 쓰인다. 사실 OpenClaw가 좀 더 오래 시장을 장악할 줄 알았는데, 아직 검증은 안됐지만 강력한 경쟁자가 들어온 것 같다. 미국에 NousResearch라는 팀이 있다. Nous Research는 오픈소스 AI 분야에서 가장 앞서가는 스타트업/연구 팀 중 하나이고.
https://x.com/supernovajunn/status/2039847124687605811

试了一下 Nous Research 的 Hermes Agent，体验比 OpenClaw 好太多了开源自主代理，装好之后常驻服务器，有持久记忆，用得越久越聪明。40+ 内置工具，网页搜索、终端、文件系统、浏览器自动化全都有。支持 Telegram、Discord、Slack、WhatsApp 多端接入，还能自然语言调度任务、多子代理并行处理
https://x.com/evanlong_me/status/2039026061640601816

@Zeneca I really tried to make OpenClaw work with Kimi 2.5, but it was unusable with anything smaller than Sonnet 4.6… Hermes, Qwen 3.5 35B drives is mostly without issues. So yeah, a pretty big difference.
https://x.com/Everlier/status/2039853380844081260

🚨 We can download models, but not see how they were built. Introducing daVinci-LLM: most transparent LLM pretraining project. 🔓 Open source: model weights, data pipeline, training process, ablations. 🎯 3B model matching 7B performance 🔗 Report:
https://t.co/HgTcXhSQSS (1/7)
https://x.com/QinYi88814/status/2038971910835560921

An open-source tool to search for interesting or optimal mathematical objects under constraints ↓
https://x.com/TheTuringPost/status/2037224197538136288

Feel like this thesis gets more and more evidence behind it every day. Cursor, Chroma, Pinterest, Cognition, Decagon, Hippocratic, Intercom (and many many more behind the scenes) all realising that the way to own the compounding flywheel of value is specialising an open-source
https://x.com/oneill_c/status/2038689976012149131

This is a noteworthy release. I don’t think there has been been a real open source model from the US that is this close to the frontier, ever. Looking forward to trying it out.
https://x.com/xlr8harder/status/2039389523403059257

Warning to open source maintainers: the Axios supply chain attack started with some very sophisticated social engineering targeted at one of their developers
https://x.com/simonw/status/2040080868958765229

Arcee AI | Trinity-Large-Thinking: Scaling an Open Source Frontier Agent
https://www.arcee.ai/blog/trinity-large-thinking

Chat LangChain is now embedded directly in our docs 📚 You can ask questions grounded in: • Full docs (LangSmith + OSS) • Knowledge base • OSS code We’ve been investing heavily in developer experience. This is one step toward making everything easier and more accessible.
https://x.com/LangChain/status/2039387501140275431

Environments in LangSmith Prompt Hub Environments give you a proper promotion workflow for your prompts: – Assign any commit to Staging or Production – Promote between environments instantly – Roll back with a single click from a full deployment history – Reference reserved tags
https://x.com/LangChain/status/2037666098561032421

great example to see how the Hercules team uses LangSmith + LLM as a judge to enrich their trace data to capture customer sentiment many models are cheap enough that it’s often worth using them to identify semantics that regex alone can’t capture ex: don’t judge but i or may
https://x.com/Vtrivedy10/status/2039186184161616245

Today we’re releasing TRL v1. 75+ methods. SFT, DPO, GRPO, async RL to take advantage of the latest and greatest open-source. 6 years from first commit to the library that post-trains most open models in the world. Built to be future proof. pip install trl
https://x.com/ClementDelangue/status/2039121367656702102

Training mRNA Language Models Across 25 Species for $165
https://huggingface.co/blog/OpenMed/training-mrna-models-25-species

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally. MIT’s Alex L. Zhang, Tim Kraska, and Omar Khattab developed Recursive Language Models (RLMs) that process
https://x.com/DeepLearningAI/status/2039831830979838240

As usual, we open-source everything, MIT license:
https://t.co/qZiy0FgTg8 Code:
https://t.co/4VpzxgWQGp Paper:
https://t.co/8E15zjokkM CaP-X is brought to you by NVIDIA, Berkeley, Stanford, and CMU. I’d like to thank the legend @Ken_Goldberg who co-advised the work, and the team
https://x.com/DrJimFan/status/2039360925606760690

OpenEvidence has become the default medical knowledge platform for over 40% of U.S. physicians; it’s relied on daily for the highest-stakes decisions in medicine. Baseten is honored to power the inference behind it.
https://x.com/tuhinone/status/2040113371593474176

Nathan Lambert’s ATOM Project Seeks American Open Source AI Models – The New Stack
https://thenewstack.io/nathan-lamberts-atom-project-seeks-american-open-source-ai-models/

Today we’re releasing Trinity-Large-Thinking. Available now on the Arcee API, with open weights on Hugging Face under Apache 2.0. We built it for developers and enterprises that want models they can inspect, post-train, host, distill, and own.
https://x.com/arcee_ai/status/2039369121591120030

🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It’s called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a
https://x.com/heynavtoor/status/2038614549973401699

Just tried out new qwen3.5:4b-nvfp4 @ollama model on M1 Max here (in project where it’s used with Koog AI agent)…..38% faster than qwen3.5:4b (averaged over 5 runs of the agent).
https://x.com/joreilly/status/2039002786130534618

this model is an agentic treasure. it has been #1 trending for 3 weeks on @huggingface as mentioned by @danielhanchen. it’s Qwen 3.5 27B fine-tuned on Opus 4.6 distilled data and beats Sonnet 4.5 on SWE-bench verified and more. “”Runs locally on 16GB in 4-bit or 32GB in 8-bit.””
https://x.com/Hesamation/status/2038642306434150427

Almost signed up for ElevenLabs to narrate my blog. $330/month. Then I tried running an open-source model on my own laptop. Qwen 3.5 14B. Sounds fine. 200 posts a month. Costs me electricity. I almost paid $4,000 a year to rent a model I can run myself. Most AI subscriptions
https://x.com/TheGeorgePu/status/2037473248577782046

Alibaba’s Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn’t mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging.
https://x.com/kimmonismus/status/2038638427604762666

Function Calling Harness: From 6.75% to 100%
https://autobe.dev/blog/function-calling-harness-qwen-meetup-korea/

Holo3, new model of @hcompany_ai outperforming closed and larger open models on GUI navigation 🔥 > A3B/35B based on Qwen3.5 > officially supported in transformers 🤗 > free license 👏
https://x.com/mervenoyann/status/2039327292665561577

I benchmarked various formats of Qwen3.5 27B: BF16, FP8, NVFP4, and INT4 on: RTX Pro 6000, B200, H100 If you have an RTX Pro 6000, INT4 is your best option for faster inference. And it’s probably also true for the RTX 5090.
https://x.com/bnjmn_marie/status/2037564190802563157

I upgraded my Ollama to use MLX and my QWEN3.5:36b speed 2.2Xd instantly.
https://x.com/Shawkat_m1/status/2039014724071719405

I’ve pushed my TurboQuant vLLM to GitHub: TQ 2.5/3.5 fused Triton KV write path Triton decode-attn from packed KV real engine/runtime integration calibration + metadata flow substantial test coverage Qwen3.5-35B AWQ 1M context 4M KV cache ZGX GB10
https://x.com/iotcoi/status/2037478891179135123

Just tested this as I was skeptical and it works suprisingly well actually ( with their llama.cpp fork). Looks like a continued pretraining of qwen3-8b in 1bit 👀. Full weights report below and github/hf instructions: ALL 399 TENSORS token_embd.weight 4096×151669
https://x.com/nisten/status/2039100896840134935

Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090
https://x.com/0xSero/status/2037560787565252666

This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large
https://x.com/PrismML/status/2039049405815529559

vLLM-Omni v0.18.0 is out — 324 commits from 83 contributors (38 new), aligned with vLLM v0.18.0. 🎉 🗣️ Production TTS/Omni serving: Qwen3-TTS, Qwen3-Omni, Fish Speech S2 Pro, Voxtral TTS 🎨 Diffusion runtime refactor with cache-dit/TeaCache and TP/SP/HSDP scaling 🔢 Unified
https://x.com/vllm_project/status/2038415516772299011

your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we’re going to put it there.
https://x.com/HessianFree/status/2039049800398655730

here it is! ~4000 agent traces of GLM-5 in hermes-agent, all uploaded to hf. thanks to @pingToven for supplying openrouter credits necessary for this. next step, fine-tune a Qwen3.5!😆
https://x.com/kaiostephens/status/2038414350986207421

Qwen 27b on the 3090 saving me a bag. This is cost savings for 7 days of usage, w/ Hermes agent. Assuming 80% cache hit (unlikely) and no cache timeout. This is conservative. 27b is between sonnet and 5.4 mini This is just my tokens in/out w/ api costs, assuming no rate
https://x.com/LottoLabs/status/2037557925015949676

Voxtral TTS paper is out! it’s a good read 🙂
https://x.com/qtnx_/status/2037553397423902846

🚀 Unitree open-sources UnifoLM-WBT-Dataset — a high-quality real-world humanoid robot whole-body teleoperation (WBT) dataset for open environments. 🥳Publicly available since March 5, 2026, the dataset will continue to receive high-frequency rolling updates. It aims to establish
https://x.com/UnitreeRobotics/status/2037440578275946551