Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, preserve every element exactly — the marigold-orange backdrop, the seated young woman with closed eyes and faint smile in her purple-and-white windbreaker, the tattooed singer in the red beanie and layered red vest, the lighting, framing, and depth of field — but replace only the black handheld microphone with a small matte-black home mini-PC server held to his mouth, complete with tiny glowing green and amber status LEDs, visible ventilation grille, and a short stubby Wi-Fi antenna, gripped the same way at the same position with seamless photographic realism. After generating the image, overlay the text “Local” in the upper-left corner of the frame in large, bold, all-caps ITC Avant Garde Gothic Pro Medium (or a near-identical geometric sans-serif if unavailable), pure white (#FFFFFF), with no date, subtitle, drop shadow, or outline. The text should be substantial in scale — taking up a meaningful portion of the upper-left area — with comfortable margin from the top and left edges, set against the negative space of the orange backdrop so it does not overlap or obscure the singer, the seated woman, or the replaced object.
Sub-32B open weights models now offer GPT-5 level intelligence with Qwen3.5 27B (Reasoning) matching GPT-5 (medium) at 42 and Gemma 4 31B (Reasoning) matching GPT-5 (low) at 39 on the Artificial Analysis Intelligence Index @Alibaba_Qwen’s Qwen3.5 and @GoogleDeepMind’s Gemma 4
https://x.com/ArtificialAnlys/status/2043929874537296026
When set up on a Mac mini, Personal Computer can run 24/7 in the background across all your apps and files. Start a task from your iPhone, and Personal Computer can operate on your desktop and local files using 2FA. Requires the latest iOS update from the App Store.
https://x.com/perplexity_ai/status/2044806021244497964
Alibaba released Qwen3.6-35B-A3B today. Big jump compared to Qwen 3.5-35B model. It’s a sparse MoE, 35B total params, only 3B active. Natively multimodal, thinking and non-thinking modes. Hardfacts: SWE-bench Verified: 73.4, near dense Qwen3.5-27B (75.0), way ahead of
https://x.com/kimmonismus/status/2044780695361290347
Small models are cheap to run, but expensive to adapt. The hard part is not only fine-tuning. It is the surrounding loop that involves collecting data, diagnosing failures, building evals, avoiding regressions, choosing curricula, and deciding when an update is safe. This new
https://x.com/dair_ai/status/2044435861580984700
The beauty of DFlash is that it reuses the hidden states of the active model, so the you can use DFlash adapters for the base models with post-trained models like Carnice 9B/27B by @kaiostephens and Ornstein by @DJLougen and get these local ~4x speedups for you local
https://x.com/winglian/status/2043731370598347066
4 reasons Gemma 4’s architecture runs efficiently on your hardware: 1. Local + global attention structure 4 or 5 local layers + 1 final global layer to preserve the context understanding 2. Special optimizations for global attention: – 8 query heads per KV head in Grouped Query
https://x.com/TheTuringPost/status/2043086456412082356
r/localLlama + r/localLLM + r/sillytavernAI preferred models list – apr 2026
| Model | Size/Class | Format | Hosted Provider | Best Local Path | Notes |
|---|---|---|---|---|---|
| Huihui Gemma 4 E2B Abliterated v2 | E2B | GGUF | No | Ollama / llama.cpp | Gemma 4 MoE with ~2B active params. Multimodal (image+text in, text out). Abliterated for reduced refusal. Lightweight enough to run fast, but MoE active-param sizing means quality punches above its weight class. |
| Huihui Gemma 4 E4B Abliterated | E4B | GGUF | No | Ollama / llama.cpp | Same Gemma 4 MoE family as E2B but with ~4B active params. Multimodal. Better quality ceiling than E2B at the cost of more compute per token. |
| SultrySilicon V2 | 7B | GGUF | No | Ollama / llama.cpp | Roleplay-focused 7B model. Smallest in the set. Good for quick creative/RP sanity checks, not for reasoning or instruction-following benchmarks. |
| Huihui-GLM-4.6V-Flash-Abliterated | 9B | GGUF | No | Ollama / llama.cpp | Based on Z.ai GLM-4.6V-Flash. Vision-language model (image+text). Abliterated. Bilingual Chinese/English. Fast inference variant of the GLM-4.6V family. |
| Gemma-2-Ataraxy-9B | 9B | GGUF | No | Ollama / llama.cpp | Merge of Gemma-2-9B-SimPO and Gemma-2-Gutenberg-9B. Creative writing and roleplay oriented. Scored well on EQ-Bench. Good balance of instruction-following and literary quality at 9B. |
| MythoMax-L2-13B | 13B | GGUF | No | Ollama / llama.cpp | By Gryphe. Llama 2 merge of MythoLogic-L2 and Huginn using experimental per-tensor gradient merging. One of the most downloaded RP/creative models ever (~59k GGUF downloads). Strong at both roleplay and storywriting. Alpaca format. The OG. |
| Dan’s PersonalityEngine V1.3.0 | 24B | GGUF | No | Ollama / llama.cpp | Fine-tuned from Mistral Small 3.1 24B Base. Trained on a massive mix: roleplay, storywriting, tool use, math, reasoning, code, medical, legal, and survival topics. Multilingual (EN, AR, DE, FR, ES, HI, PT, JA, KO). A genuine generalist with personality. |
| SuperGemma4 26B Abliterated Multimodal | 26B multimodal | GGUF | No | custom multimodal stack | Based on Gemma 4 26B-A4B. Multimodal (image-text-to-text). Abliterated with low refusal. Optimized for Apple Silicon (MLX). Supports Korean + English. Tool use and coding tags. |
| Gemma 3 27B Abliterated | 27B | GGUF | No | Ollama / llama.cpp | Abliterated version of Google’s Gemma 3 27B instruct. Multimodal (image-text-to-text). Reduced refusal behavior while preserving instruction-following quality. |
| Huihui Gemma 4 31B Abliterated | 31B | GGUF | No | Ollama / llama.cpp | Abliterated Gemma 4 31B instruct. Multimodal (any-to-any pipeline tag). Dense 31B, not MoE. Strongest Gemma 4 dense abliterated option. |
| Gemma 4 31B Abliterated | 31B | GGUF + safetensors | No | Ollama / llama.cpp | Same base as above (Gemma 4 31B-it) but different abliteration method using mlabonne’s harmful_behaviors + harmless_alpaca datasets. Both formats in one repo. |
| Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-Abliterated | 35B A3B | GGUF | No | Ollama / llama.cpp | Qwen 3.5 MoE (35B total, ~3B active). Distilled from Claude 4.6 Opus reasoning. Chain-of-thought and reasoning-focused. Abliterated. Multimodal. Punches well above its active param count on reasoning tasks. |
| Midnight Rose 70B v2.0.3 | 70B | GGUF | No | Ollama / llama.cpp | By sophosympatheia. Complex multi-stage SLERP/DARE-TIES merge of WizardLM, Tulu-2-DPO, Dolphin, and earlier Midnight Rose versions. Uncensored. Designed for roleplay and storytelling. Scored surprisingly high on EQ-Bench even at low quants. ~6k context sweet spot. |
| Midnight Miqu 70B v1.5 | 70B | GGUF | No | Ollama / llama.cpp | Llama-family merge of Midnight-Miqu v1.0 and Tess-70B. Creative writing and roleplay focused. 32k context. Known for strong prose quality and character consistency at 70B scale. |
| Midnight Rose 103B v2.0.3 | 103B | GGUF | No | heavy self-host | Same lineage as the 70B but scaled up. Importance-matrix GGUF by mradermacher. Firmly in the “need real hardware” category. |
| DeepSeek V3 | 671B A37B | safetensors | Yes: DeepInfra, Novita | Hosted preferred | Massive MoE. 671B total, 37B active. Strong on code, math, and instruction-following. Pre-trained on ~15T tokens. Use via OpenRouter, not locally. |
| DeepSeek V3.2 | 685B A37B | safetensors | No confirmed provider yet | Hosted preferred | Successor to V3. Same general architecture class. Not a local play. |
| Behemoth-123B-v1 | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Creative/RP community model. Massive parameter count makes it impractical for casual local use but prized for output quality in the r/LocalLLM community. |
| Monstral-123B | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Text generation and chat focused. Same weight class as Behemoth, different training mix and community lineage. |
| BlackSheep-Large | ~27B | GGUF | No | Ollama / llama.cpp | By TroyDoesAI. Canonical repo is gated. Q8_0 is ~29.5 GB, placing it in the 27B-class. Community RP/creative model. |
So much in this release but the one many have been waiting for above the rest, the GUI dashboard! Manage and monitor your Hermes Agent with a GUI Local Web Dashboard with `hermes dashboard` command to start it!
https://x.com/Teknium/status/2043771509123232230
Is there somewhere a collection of the best agent/coding harnesses for each models, especially open-source and local ones? In my opinion, the biggest reason why people are struggling with open/local models these days is that the agent/coding harnesses in most open agent are not
https://x.com/ClementDelangue/status/2044139560355901911
Today we’re releasing Personal Computer. Personal Computer integrates with the Perplexity Mac App for secure orchestration across your local files, native apps, and browser. We’re rolling this out to all Perplexity Max subscribers and everyone on the waitlist starting today.
https://x.com/perplexity_ai/status/2044805973085454518
2-bit Qwen3.6-35B-A3B did a complete repo bug hunt with evidence, repro, fixes, tests and a PR writeup. 🔥 Run it locally in Unsloth Studio with just 13GB RAM. 2-bit Qwen3.6 GGUF made 30+ tool calls, searched 20 sites and executed Python code. GitHub:
https://x.com/UnslothAI/status/2044858346948464743
Qwen3.6-35B-A3B can now be run locally!💜 The model is the strongest mid-sized LLM on nearly all benchmarks. Run on 23GB RAM via Unsloth Dynamic GGUFs. GGUFs to run:
https://t.co/VlyW8UwDjw Guide:
https://x.com/UnslothAI/status/2044786492451778988
Medical AI models now run on iPhone. No cloud. No API. OpenMed 1.0.0 just shipped. MLX backend for Apple Silicon. Swift package for macOS and iOS. 200+ PII detection models across 8 languages. pip install openmed Open source. Apache 2.0.
https://x.com/MaziyarPanahi/status/2044037968659103806





Leave a Reply