Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Animation cel style illustration of a muscular blue-skinned genie emerging from a golden oil lamp on the deck of a modern deep-sea research vessel, magical teal wisps flowing downward toward dark ocean waters below, warm spotlight on the lamp and genie contrasting with deep blue-black ocean depths, clean composition with horizontal space for title text, Disney-quality hand-drawn aesthetic with bold outlines and volumetric magical effects.
The most interesting insight is DeepSeek OCR 2 is how it presents a *learnable* raster order, similar to how humans scan contiguous elements in a document, instead of a ‘dumb’ raster order of left to right scanning: 1. A vanilla transformer would encode the image left-to-right”” https://x.com/jerryjliu0/status/2016319238974407146
Third party eval. DeepSeek-OCR 2 is, practically speaking, about on par with dots.ocr. Which is a good model, but nowhere near SOTA at this point. I think it’ll mainly be interesting for how much of its ideas make it into the final multimodal product.”” https://x.com/teortaxesTex/status/2016179572056678739
embedding parameters are hot again, amazing paper from LongCat Flash, concurrent with DeepSeek’s Engram! differences with Engram: -> no per-layer embedding (they tried per layer embedding (PLE) but no real gains) -> simple averaging fusion instead of Engram’s dynamic”” https://x.com/eliebakouch/status/2016577949676319092
DeepSought https://spyglass.org/deepseek-moment/
🚀 DeepSeek-OCR 2 — introducing Visual Causal Flow from @deepseek_ai, learning to read documents the way humans do — now running on vLLM ⚡ with vllm==0.8.5 day-0 support. 🧠 Replaces fixed raster scanning with learned causal token reordering via DeepEncoder V2. 📄 16× visual”” https://x.com/vllm_project/status/2016065526058090967
New DeepSeek-OCR-2 model! 1. Utilizes Qwen2 500M as a vision encoder instead of VIT 300M 2. Adds causal mask with a non causal mask 3. Accuracy boost by 3.73% to 91.09% from 87.36% 4. Edit Distance 0.100 vs 0.129 for OCR v1 And we added DS-OCR-2 fine-tuning support in Unsloth!”” https://x.com/danielhanchen/status/2016043326760485313





Leave a Reply