Image created with gemini-3.1-flash-image-preview with claude-opus-4.7. Image prompt: Using the provided reference image, preserve every element exactly — the marigold-orange backdrop, the seated young woman with closed-eyes smile in her purple-and-white windbreaker, the tattooed singer in the red beanie and layered red vest, the lighting and framing — but replace only the black handheld microphone with a small plush toy llama held to his mouth in the same hand grip and position, its long fuzzy neck angled like a microphone shaft and its little face pressed near his lips, photographed with seamless realism and matching studio lighting. After generating the image, overlay the text “Llama” in the upper-left corner of the frame in large, bold, all-caps ITC Avant Garde Gothic Pro Medium (or a near-identical geometric sans-serif if unavailable), pure white (#FFFFFF), with no date, subtitle, drop shadow, or outline. The text should be substantial in scale — taking up a meaningful portion of the upper-left area — with comfortable margin from the top and left edges, set against the negative space of the orange backdrop so it does not overlap or obscure the singer, the seated woman, or the replaced object.
Ollama
https://ollama.com/
r/localLlama + r/localLLM + r/sillytavernAI preferred models list – apr 2026
| Model | Size/Class | Format | Hosted Provider | Best Local Path | Notes |
|---|---|---|---|---|---|
| Huihui Gemma 4 E2B Abliterated v2 | E2B | GGUF | No | Ollama / llama.cpp | Gemma 4 MoE with ~2B active params. Multimodal (image+text in, text out). Abliterated for reduced refusal. Lightweight enough to run fast, but MoE active-param sizing means quality punches above its weight class. |
| Huihui Gemma 4 E4B Abliterated | E4B | GGUF | No | Ollama / llama.cpp | Same Gemma 4 MoE family as E2B but with ~4B active params. Multimodal. Better quality ceiling than E2B at the cost of more compute per token. |
| SultrySilicon V2 | 7B | GGUF | No | Ollama / llama.cpp | Roleplay-focused 7B model. Smallest in the set. Good for quick creative/RP sanity checks, not for reasoning or instruction-following benchmarks. |
| Huihui-GLM-4.6V-Flash-Abliterated | 9B | GGUF | No | Ollama / llama.cpp | Based on Z.ai GLM-4.6V-Flash. Vision-language model (image+text). Abliterated. Bilingual Chinese/English. Fast inference variant of the GLM-4.6V family. |
| Gemma-2-Ataraxy-9B | 9B | GGUF | No | Ollama / llama.cpp | Merge of Gemma-2-9B-SimPO and Gemma-2-Gutenberg-9B. Creative writing and roleplay oriented. Scored well on EQ-Bench. Good balance of instruction-following and literary quality at 9B. |
| MythoMax-L2-13B | 13B | GGUF | No | Ollama / llama.cpp | By Gryphe. Llama 2 merge of MythoLogic-L2 and Huginn using experimental per-tensor gradient merging. One of the most downloaded RP/creative models ever (~59k GGUF downloads). Strong at both roleplay and storywriting. Alpaca format. The OG. |
| Dan’s PersonalityEngine V1.3.0 | 24B | GGUF | No | Ollama / llama.cpp | Fine-tuned from Mistral Small 3.1 24B Base. Trained on a massive mix: roleplay, storywriting, tool use, math, reasoning, code, medical, legal, and survival topics. Multilingual (EN, AR, DE, FR, ES, HI, PT, JA, KO). A genuine generalist with personality. |
| SuperGemma4 26B Abliterated Multimodal | 26B multimodal | GGUF | No | custom multimodal stack | Based on Gemma 4 26B-A4B. Multimodal (image-text-to-text). Abliterated with low refusal. Optimized for Apple Silicon (MLX). Supports Korean + English. Tool use and coding tags. |
| Gemma 3 27B Abliterated | 27B | GGUF | No | Ollama / llama.cpp | Abliterated version of Google’s Gemma 3 27B instruct. Multimodal (image-text-to-text). Reduced refusal behavior while preserving instruction-following quality. |
| Huihui Gemma 4 31B Abliterated | 31B | GGUF | No | Ollama / llama.cpp | Abliterated Gemma 4 31B instruct. Multimodal (any-to-any pipeline tag). Dense 31B, not MoE. Strongest Gemma 4 dense abliterated option. |
| Gemma 4 31B Abliterated | 31B | GGUF + safetensors | No | Ollama / llama.cpp | Same base as above (Gemma 4 31B-it) but different abliteration method using mlabonne’s harmful_behaviors + harmless_alpaca datasets. Both formats in one repo. |
| Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-Abliterated | 35B A3B | GGUF | No | Ollama / llama.cpp | Qwen 3.5 MoE (35B total, ~3B active). Distilled from Claude 4.6 Opus reasoning. Chain-of-thought and reasoning-focused. Abliterated. Multimodal. Punches well above its active param count on reasoning tasks. |
| Midnight Rose 70B v2.0.3 | 70B | GGUF | No | Ollama / llama.cpp | By sophosympatheia. Complex multi-stage SLERP/DARE-TIES merge of WizardLM, Tulu-2-DPO, Dolphin, and earlier Midnight Rose versions. Uncensored. Designed for roleplay and storytelling. Scored surprisingly high on EQ-Bench even at low quants. ~6k context sweet spot. |
| Midnight Miqu 70B v1.5 | 70B | GGUF | No | Ollama / llama.cpp | Llama-family merge of Midnight-Miqu v1.0 and Tess-70B. Creative writing and roleplay focused. 32k context. Known for strong prose quality and character consistency at 70B scale. |
| Midnight Rose 103B v2.0.3 | 103B | GGUF | No | heavy self-host | Same lineage as the 70B but scaled up. Importance-matrix GGUF by mradermacher. Firmly in the “need real hardware” category. |
| DeepSeek V3 | 671B A37B | safetensors | Yes: DeepInfra, Novita | Hosted preferred | Massive MoE. 671B total, 37B active. Strong on code, math, and instruction-following. Pre-trained on ~15T tokens. Use via OpenRouter, not locally. |
| DeepSeek V3.2 | 685B A37B | safetensors | No confirmed provider yet | Hosted preferred | Successor to V3. Same general architecture class. Not a local play. |
| Behemoth-123B-v1 | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Creative/RP community model. Massive parameter count makes it impractical for casual local use but prized for output quality in the r/LocalLLM community. |
| Monstral-123B | 123B | GGUF | No | heavy self-host | Mistral-family 123B. Text generation and chat focused. Same weight class as Behemoth, different training mix and community lineage. |
| BlackSheep-Large | ~27B | GGUF | No | Ollama / llama.cpp | By TroyDoesAI. Canonical repo is gated. Q8_0 is ~29.5 GB, placing it in the 27B-class. Community RP/creative model. |





Leave a Reply