Ethan B. Holland

Over 54,400 manually organized AI links and counting

International: AI News Week Ending 04/03/2026

April 3, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted perfume bottle with warm amber-gold liquid, crystal stopper, pure white background, soft shadow, and glass refractions; replace the label text with ‘International’ in the same clean black serif font; add a delicate sterling silver chain draped around the bottle neck holding one small dainty globe pendant with etched continents in high-fashion jewelry aesthetic, maintaining luxury fragrance photography lighting and composition.

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: ‘Audio-Visual Vibe Coding’.
https://x.com/Alibaba_Qwen/status/2038636335272194241

Demo2：Audio-Visual Vibe Coding
https://x.com/Alibaba_Qwen/status/2038637124619231467

Here’s another demo of Audio-Visual Vibe Coding~
https://x.com/Alibaba_Qwen/status/2038641496455557565

Qwen
https://qwen.ai/blog?id=qwen3.5-omni

Qwen
https://qwen.ai/blog?id=qwen3.6

Chinese OpenSource models are gonna mug Anthropic & OpenAI like they never existed before The coding gap between open and closed-source is practically gone GLM-5.1 gives the almost the same comparable coding performance that goes toe-to-toe with Claude Opus, but a roughly 10x
https://x.com/XFreeze/status/2037695882301436412

Mistral secures $830 million in debt financing to fund AI data center
https://www.cnbc.com/2026/03/30/mistral-ai-paris-data-center-cluster-debt-financing.html

China’s DeepSeek suffers rare outage lasting several hours

China’s DeepSeek suffers rare outage lasting several hours

MoE models differ from the likes of DeepSeek and Qwen: instead of using shared experts in parallel to the routed ones, Gemma adds MoE blocks as separate layers in addition to the normal MLP blocks. So the architecture is Attention -> MLP -> MoE
https://x.com/norpadon/status/2039750841754697767

.@MistralAI’s new Voxtral TTS generates expressive, multilingual speech from just ~3 seconds of reference audio It solves one of the hardest problems in speech, separating what you say from how you sound ➡️ Voxtral factorizes speech into two parts: • semantic tokens → the
https://x.com/TheTuringPost/status/2038285318827413800

Real-life conversation AI, powered by Voxtral 😎
https://x.com/sophiamyang/status/2037523809914241069

Long context windows are now available for select models on Tinker! – 128k tokens for Kimi K2.5 and GPT-OSS-120B – 256k for Nemotron 3 Super 120B and Qwen3.5 397B. For more details and pricing, see our full model lineup:
https://x.com/tinkerapi/status/2039424320393621649

ClawHub now has an official China mirror 🇨🇳🦞
https://t.co/d8Odd4sNOp Just tell your agent: “”Find skills on ClawHub using
https://t.co/NoR7AXyM6U”” Thanks @BytePlusGlobal / VolcanoEngine for the infra sponsorship 🙏 Other regions need a mirror? PRs welcome.
https://x.com/openclaw/status/2039240359197438229

@Zeneca I really tried to make OpenClaw work with Kimi 2.5, but it was unusable with anything smaller than Sonnet 4.6… Hermes, Qwen 3.5 35B drives is mostly without issues. So yeah, a pretty big difference.
https://x.com/Everlier/status/2039853380844081260

🚨 397 billion parameters. On a MacBook. No cloud. No GPU cluster. No data center. A laptop. Someone ran one of the largest AI models on Earth on a machine you can buy at the Apple Store. It’s called flash-moe. A pure C and Metal inference engine that runs Qwen3.5-397B on a
https://x.com/heynavtoor/status/2038614549973401699

Just tried out new qwen3.5:4b-nvfp4 @ollama model on M1 Max here (in project where it’s used with Koog AI agent)…..38% faster than qwen3.5:4b (averaged over 5 runs of the agent).
https://x.com/joreilly/status/2039002786130534618

this model is an agentic treasure. it has been #1 trending for 3 weeks on @huggingface as mentioned by @danielhanchen. it’s Qwen 3.5 27B fine-tuned on Opus 4.6 distilled data and beats Sonnet 4.5 on SWE-bench verified and more. “”Runs locally on 16GB in 4-bit or 32GB in 8-bit.””
https://x.com/Hesamation/status/2038642306434150427

Almost signed up for ElevenLabs to narrate my blog. $330/month. Then I tried running an open-source model on my own laptop. Qwen 3.5 14B. Sounds fine. 200 posts a month. Costs me electricity. I almost paid $4,000 a year to rent a model I can run myself. Most AI subscriptions
https://x.com/TheGeorgePu/status/2037473248577782046

Alibaba’s Qwen3.5-Omni just dropped with script-level captioning, audio-visual vibe coding, and real-time web search built in. However, there is a catch: Omni here doesn’t mean *creating* image or voice, but rather interpreting it. So, a caveat. Open access via Hugging.
https://x.com/kimmonismus/status/2038638427604762666

Function Calling Harness: From 6.75% to 100%
https://autobe.dev/blog/function-calling-harness-qwen-meetup-korea/

Holo3, new model of @hcompany_ai outperforming closed and larger open models on GUI navigation 🔥 > A3B/35B based on Qwen3.5 > officially supported in transformers 🤗 > free license 👏
https://x.com/mervenoyann/status/2039327292665561577

I benchmarked various formats of Qwen3.5 27B: BF16, FP8, NVFP4, and INT4 on: RTX Pro 6000, B200, H100 If you have an RTX Pro 6000, INT4 is your best option for faster inference. And it’s probably also true for the RTX 5090.
https://x.com/bnjmn_marie/status/2037564190802563157

I upgraded my Ollama to use MLX and my QWEN3.5:36b speed 2.2Xd instantly.
https://x.com/Shawkat_m1/status/2039014724071719405

I’ve pushed my TurboQuant vLLM to GitHub: TQ 2.5/3.5 fused Triton KV write path Triton decode-attn from packed KV real engine/runtime integration calibration + metadata flow substantial test coverage Qwen3.5-35B AWQ 1M context 4M KV cache ZGX GB10
https://x.com/iotcoi/status/2037478891179135123

Just tested this as I was skeptical and it works suprisingly well actually ( with their llama.cpp fork). Looks like a continued pretraining of qwen3-8b in 1bit 👀. Full weights report below and github/hf instructions: ALL 399 TENSORS token_embd.weight 4096×151669
https://x.com/nisten/status/2039100896840134935

Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090
https://x.com/0xSero/status/2037560787565252666

This scatter plot shows the Pareto frontier of intelligence vs. size, defined by models like Qwen3 0.6B, 1.7B, 4B, 8B, and Ministral3 3B. The 1-bit Bonsai family shifts that frontier dramatically to the left. This changes the tradeoff itself: models no longer have to be large
https://x.com/PrismML/status/2039049405815529559

vLLM-Omni v0.18.0 is out — 324 commits from 83 contributors (38 new), aligned with vLLM v0.18.0. 🎉 🗣️ Production TTS/Omni serving: Qwen3-TTS, Qwen3-Omni, Fish Speech S2 Pro, Voxtral TTS 🎨 Diffusion runtime refactor with cache-dit/TeaCache and TP/SP/HSDP scaling 🔢 Unified
https://x.com/vllm_project/status/2038415516772299011

your spotify cache is bigger than our largest AI model. Bonsai: 1-bit weights. 1.7B to 8B params. 14x compression vs bf16. 8x faster on edge. 256 MB to 1.2GB. Based on Qwen 3. we just came out of stealth. intelligence belongs at the edge and we’re going to put it there.
https://x.com/HessianFree/status/2039049800398655730

here it is! ~4000 agent traces of GLM-5 in hermes-agent, all uploaded to hf. thanks to @pingToven for supplying openrouter credits necessary for this. next step, fine-tune a Qwen3.5!😆
https://x.com/kaiostephens/status/2038414350986207421

Qwen 27b on the 3090 saving me a bag. This is cost savings for 7 days of usage, w/ Hermes agent. Assuming 80% cache hit (unlikely) and no cache timeout. This is conservative. 27b is between sonnet and 5.4 mini This is just my tokens in/out w/ api costs, assuming no rate
https://x.com/LottoLabs/status/2037557925015949676

U.S. Senators Tom Cotton [R] and Chuck Schumer [D] plan to introduce the American Security Robotics Act to ban the federal government from buying or operating Chinese-made humanoid robots. The bill prohibits federal funding for these systems and requires phasing out existing
https://x.com/TheHumanoidHub/status/2038088330378879443

Voxtral TTS paper is out! it’s a good read 🙂
https://x.com/qtnx_/status/2037553397423902846