Locally Run: AI News Week Ending 04/17/2026

Locally Run: AI News Week Ending 04/17/2026

April 17, 2026

Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, preserve every element exactly — the marigold-orange backdrop, the seated young woman with closed eyes and faint smile in her purple-and-white windbreaker, the tattooed singer in the red beanie and layered red vest, the lighting, framing, and depth of field — but replace only the black handheld microphone with a small matte-black home mini-PC server held to his mouth, complete with tiny glowing green and amber status LEDs, visible ventilation grille, and a short stubby Wi-Fi antenna, gripped the same way at the same position with seamless photographic realism. After generating the image, overlay the text “Local” in the upper-left corner of the frame in large, bold, all-caps ITC Avant Garde Gothic Pro Medium (or a near-identical geometric sans-serif if unavailable), pure white (#FFFFFF), with no date, subtitle, drop shadow, or outline. The text should be substantial in scale — taking up a meaningful portion of the upper-left area — with comfortable margin from the top and left edges, set against the negative space of the orange backdrop so it does not overlap or obscure the singer, the seated woman, or the replaced object.

Sub-32B open weights models now offer GPT-5 level intelligence with Qwen3.5 27B (Reasoning) matching GPT-5 (medium) at 42 and Gemma 4 31B (Reasoning) matching GPT-5 (low) at 39 on the Artificial Analysis Intelligence Index @Alibaba_Qwen’s Qwen3.5 and @GoogleDeepMind’s Gemma 4
https://x.com/ArtificialAnlys/status/2043929874537296026

When set up on a Mac mini, Personal Computer can run 24/7 in the background across all your apps and files. Start a task from your iPhone, and Personal Computer can operate on your desktop and local files using 2FA. Requires the latest iOS update from the App Store.
https://x.com/perplexity_ai/status/2044806021244497964

Alibaba released Qwen3.6-35B-A3B today. Big jump compared to Qwen 3.5-35B model. It’s a sparse MoE, 35B total params, only 3B active. Natively multimodal, thinking and non-thinking modes. Hardfacts: SWE-bench Verified: 73.4, near dense Qwen3.5-27B (75.0), way ahead of
https://x.com/kimmonismus/status/2044780695361290347

Small models are cheap to run, but expensive to adapt. The hard part is not only fine-tuning. It is the surrounding loop that involves collecting data, diagnosing failures, building evals, avoiding regressions, choosing curricula, and deciding when an update is safe. This new
https://x.com/dair_ai/status/2044435861580984700

The beauty of DFlash is that it reuses the hidden states of the active model, so the you can use DFlash adapters for the base models with post-trained models like Carnice 9B/27B by @kaiostephens and Ornstein by @DJLougen and get these local ~4x speedups for you local
https://x.com/winglian/status/2043731370598347066

4 reasons Gemma 4’s architecture runs efficiently on your hardware: 1. Local + global attention structure 4 or 5 local layers + 1 final global layer to preserve the context understanding 2. Special optimizations for global attention: – 8 query heads per KV head in Grouped Query
https://x.com/TheTuringPost/status/2043086456412082356

r/localLlama + r/localLLM + r/sillytavernAI preferred models list – apr 2026

Model	Size/Class	Format	Hosted Provider	Best Local Path	Notes
Huihui Gemma 4 E2B Abliterated v2	E2B	GGUF	No	Ollama / llama.cpp	Gemma 4 MoE with ~2B active params. Multimodal (image+text in, text out). Abliterated for reduced refusal. Lightweight enough to run fast, but MoE active-param sizing means quality punches above its weight class.
Huihui Gemma 4 E4B Abliterated	E4B	GGUF	No	Ollama / llama.cpp	Same Gemma 4 MoE family as E2B but with ~4B active params. Multimodal. Better quality ceiling than E2B at the cost of more compute per token.
SultrySilicon V2	7B	GGUF	No	Ollama / llama.cpp	Roleplay-focused 7B model. Smallest in the set. Good for quick creative/RP sanity checks, not for reasoning or instruction-following benchmarks.
Huihui-GLM-4.6V-Flash-Abliterated	9B	GGUF	No	Ollama / llama.cpp	Based on Z.ai GLM-4.6V-Flash. Vision-language model (image+text). Abliterated. Bilingual Chinese/English. Fast inference variant of the GLM-4.6V family.
Gemma-2-Ataraxy-9B	9B	GGUF	No	Ollama / llama.cpp	Merge of Gemma-2-9B-SimPO and Gemma-2-Gutenberg-9B. Creative writing and roleplay oriented. Scored well on EQ-Bench. Good balance of instruction-following and literary quality at 9B.
MythoMax-L2-13B	13B	GGUF	No	Ollama / llama.cpp	By Gryphe. Llama 2 merge of MythoLogic-L2 and Huginn using experimental per-tensor gradient merging. One of the most downloaded RP/creative models ever (~59k GGUF downloads). Strong at both roleplay and storywriting. Alpaca format. The OG.
Dan’s PersonalityEngine V1.3.0	24B	GGUF	No	Ollama / llama.cpp	Fine-tuned from Mistral Small 3.1 24B Base. Trained on a massive mix: roleplay, storywriting, tool use, math, reasoning, code, medical, legal, and survival topics. Multilingual (EN, AR, DE, FR, ES, HI, PT, JA, KO). A genuine generalist with personality.
SuperGemma4 26B Abliterated Multimodal	26B multimodal	GGUF	No	custom multimodal stack	Based on Gemma 4 26B-A4B. Multimodal (image-text-to-text). Abliterated with low refusal. Optimized for Apple Silicon (MLX). Supports Korean + English. Tool use and coding tags.
Gemma 3 27B Abliterated	27B	GGUF	No	Ollama / llama.cpp	Abliterated version of Google’s Gemma 3 27B instruct. Multimodal (image-text-to-text). Reduced refusal behavior while preserving instruction-following quality.
Huihui Gemma 4 31B Abliterated	31B	GGUF	No	Ollama / llama.cpp	Abliterated Gemma 4 31B instruct. Multimodal (any-to-any pipeline tag). Dense 31B, not MoE. Strongest Gemma 4 dense abliterated option.
Gemma 4 31B Abliterated	31B	GGUF + safetensors	No	Ollama / llama.cpp	Same base as above (Gemma 4 31B-it) but different abliteration method using mlabonne’s harmful_behaviors + harmless_alpaca datasets. Both formats in one repo.
Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-Abliterated	35B A3B	GGUF	No	Ollama / llama.cpp	Qwen 3.5 MoE (35B total, ~3B active). Distilled from Claude 4.6 Opus reasoning. Chain-of-thought and reasoning-focused. Abliterated. Multimodal. Punches well above its active param count on reasoning tasks.
Midnight Rose 70B v2.0.3	70B	GGUF	No	Ollama / llama.cpp	By sophosympatheia. Complex multi-stage SLERP/DARE-TIES merge of WizardLM, Tulu-2-DPO, Dolphin, and earlier Midnight Rose versions. Uncensored. Designed for roleplay and storytelling. Scored surprisingly high on EQ-Bench even at low quants. ~6k context sweet spot.
Midnight Miqu 70B v1.5	70B	GGUF	No	Ollama / llama.cpp	Llama-family merge of Midnight-Miqu v1.0 and Tess-70B. Creative writing and roleplay focused. 32k context. Known for strong prose quality and character consistency at 70B scale.
Midnight Rose 103B v2.0.3	103B	GGUF	No	heavy self-host	Same lineage as the 70B but scaled up. Importance-matrix GGUF by mradermacher. Firmly in the “need real hardware” category.
DeepSeek V3	671B A37B	safetensors	Yes: DeepInfra, Novita	Hosted preferred	Massive MoE. 671B total, 37B active. Strong on code, math, and instruction-following. Pre-trained on ~15T tokens. Use via OpenRouter, not locally.
DeepSeek V3.2	685B A37B	safetensors	No confirmed provider yet	Hosted preferred	Successor to V3. Same general architecture class. Not a local play.
Behemoth-123B-v1	123B	GGUF	No	heavy self-host	Mistral-family 123B. Creative/RP community model. Massive parameter count makes it impractical for casual local use but prized for output quality in the r/LocalLLM community.
Monstral-123B	123B	GGUF	No	heavy self-host	Mistral-family 123B. Text generation and chat focused. Same weight class as Behemoth, different training mix and community lineage.
BlackSheep-Large	~27B	GGUF	No	Ollama / llama.cpp	By TroyDoesAI. Canonical repo is gated. Q8_0 is ~29.5 GB, placing it in the 27B-class. Community RP/creative model.

view raw

gistfile1.md

hosted with ❤ by GitHub

So much in this release but the one many have been waiting for above the rest, the GUI dashboard! Manage and monitor your Hermes Agent with a GUI Local Web Dashboard with `hermes dashboard` command to start it!
https://x.com/Teknium/status/2043771509123232230

Is there somewhere a collection of the best agent/coding harnesses for each models, especially open-source and local ones? In my opinion, the biggest reason why people are struggling with open/local models these days is that the agent/coding harnesses in most open agent are not
https://x.com/ClementDelangue/status/2044139560355901911

Today we’re releasing Personal Computer. Personal Computer integrates with the Perplexity Mac App for secure orchestration across your local files, native apps, and browser. We’re rolling this out to all Perplexity Max subscribers and everyone on the waitlist starting today.
https://x.com/perplexity_ai/status/2044805973085454518

2-bit Qwen3.6-35B-A3B did a complete repo bug hunt with evidence, repro, fixes, tests and a PR writeup. 🔥 Run it locally in Unsloth Studio with just 13GB RAM. 2-bit Qwen3.6 GGUF made 30+ tool calls, searched 20 sites and executed Python code. GitHub:
https://x.com/UnslothAI/status/2044858346948464743

Qwen3.6-35B-A3B can now be run locally!💜 The model is the strongest mid-sized LLM on nearly all benchmarks. Run on 23GB RAM via Unsloth Dynamic GGUFs. GGUFs to run:
https://t.co/VlyW8UwDjw Guide:
https://x.com/UnslothAI/status/2044786492451778988

Medical AI models now run on iPhone. No cloud. No API. OpenMed 1.0.0 just shipped. MLX backend for Apple Silicon. Swift package for macOS and iOS. 200+ PII detection models across 8 languages. pip install openmed Open source. Apache 2.0.
https://x.com/MaziyarPanahi/status/2044037968659103806