Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: A grand courtroom with black and white checkered marble floor forming a giant chessboard, lawyers and judges positioned as elegant chess pieces, ornate wooden benches and witness stand aligned to the grid, warm judicial lighting from tall windows, gavels resting on chess piece bases, dramatic perspective showing the strategic legal battlefield.
We’ve built tooling to help you build and ship an e2e document agent in < 5 mins 📄🤖 – and the core is available to everyone! A lot of data is within docs. We already have incredible OCR tooling over docs. The next step is knowledge automation. There’s a lot of agentic https://x.com/jerryjliu0/status/1980759684916408443
Massively unexpected update from DeepSeek: a powerful, high-compression MoE OCR model. > In production, DeepSeek-OCR can generate 33 million pages of data per day for LLMs/VLMs using 20 nodes (x8 A100-40G). They want ALL the tokens. You’re welcome to have some too. https://x.com/teortaxesTex/status/1980160624140456370
DeepSeek released an OCR model today. Their motivation is really interesting: they want to use visual modality as an efficient compression medium for textual information, and use this to solve long-context challenges in LLMs. Of course, they are using it to get more training https://x.com/iScienceLuvr/status/1980247935700066468
For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called “”Language Modelling with Pixels””. They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches https://x.com/NielsRogge/status/1980559120760791125
We’re seeing a lot of usage around DeepSeek’s new OCR model. Alex packaged it so you can deploy and test it yourself – prompts and sample images included.”” / X https://x.com/basetenco/status/1980924381217104338
DeepSeek-OCR looks impressive, but its core idea is not new. Input “Text” as “Image” — already explored by: LANGUAGE MODELING WITH PIXELS (Phillip et al., ICLR 2023) CLIPPO: Image-and-Language Understanding from Pixels Only (Michael et al. CVPR 2023) Pix2Struct: Screenshot https://x.com/awinyimgprocess/status/1980506449706119642
A more serious thread on the DeepSeek-OCR hype / serious misinterpretation going on. 1. On token reduction via representing text in images, researchers from Cambridge have previously shown that 500x prompt token compression is possible (ACL’25, Li, Su, and Collier). Without https://x.com/Kangwook_Lee/status/1980709454522744902
DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about. In short, they explore how vision https://x.com/rasbt/status/1980642191950090585
I quite like the new DeepSeek-OCR paper. It’s a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn’t matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language”” / X https://x.com/karpathy/status/1980397031542989305
🚨 DeepSeek just did something wild. They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels. Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That https://x.com/godofprompt/status/1980233080213590326
Letsss gooo! DeepSeek just released a 3B OCR model on Hugging Face 🔥 Optimised to be token efficient AND scale ~200K+ pages/day on A100-40G Same arch as DeepSeek VL2 Use it with Transformers, vLLM and more 🤗 https://x.com/reach_vb/status/1980170192392270227
a bunch of OCR models released in past few weeks: ~ deepseek-ocr-3b ~ olmo-ocr-2-7b ~ chandra-ocr-8b ~ nanonets-ocr2-3b ~ paddleocr-vl-0.9B ~ qwen3-vl-dense/moe (general vlm) ~ dots.ocr-3b Will be dropping a detailed comparison soon”” / X https://x.com/HarveenChadha/status/1981055277408669934
NEW DeepSeek OCR model that outperforms dots ocr while prefilling 3x less tokens https://x.com/casper_hansen_/status/1980166248878203093
DeepSeek-OCR has some weird architectural choices for the LLM decoder: DeepSeek3B-MoE-A570M -> uses MHA, no MLA (not even GQA?) -> 2 shared experts (like DeepSeek V2, but V3 only has 1) -> quite low sparsity, activation ratio is 12.5%. For V3 it’s 3.52%, for V2 it’s 5% -> not https://x.com/eliebakouch/status/1980193125202083951
I think Glyph coming out on the same day a) corroborates the results of DeepSeek OCR b) confirms the “they had it lying around for a while” suspicion. Charitably, they learned of Zhipu’s project retracing their steps and sped up the release. Other possibilities are obvious too.”” / X https://x.com/teortaxesTex/status/1980642000006451348
deepseek-ai/DeepSeek-OCR: Contexts Optical Compression https://github.com/deepseek-ai/DeepSeek-OCR
what happened this week with OCR and VLMs? * deepseek-ocr * chandra-ocr * nanonets-ocr2 * paddleocr-vl * qwen3-vl (2B, 32B, Instruct and Thinking) * dots.ocr * olmOCR 2 (based on Qwen2.5-VL) * LightOnOCR (smallies) top 5 trending models on @huggingface are still OCR/VLM! https://x.com/MaziyarPanahi/status/1981421331053760775
DeepSeek-OCR Contexts Optical Compression https://x.com/_akhaliq/status/1980260630780162505
DeepSeek OCR dropped … but honestly, Glyph [1], released the same day, showed something more interesting: 3–4× context compression and infilling cost reduction, no performance hit on long-context QA and summarization, which is much less trivial than OCR in many cases. If that https://x.com/arankomatsuzaki/status/1980722682246398069
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping https://x.com/vllm_project/status/1980235518706401405
Sneak Peek: Likeness Detection – YouTube https://www.youtube.com/watch?v=zVqQiBb0F-w
We’re releasing the full FinePdfs source code — plus new datasets and models! 🚀 📚 Datasets: • OCR-Annotations — 1.6k PDFs labeled for OCR need • Gemma-LID-Annotation — 20k samples per language (annotated with Gemma3-27B) 🤖 Models: • XGB-OCR — OCR classifier for PDFs https://x.com/HKydlicek/status/1980319822585143498
Wow OCR models are taking off in vLLM 😍 Small but powerful 💪 Enjoy this fast OCR model from @staghado ✌️”” / X https://x.com/vllm_project/status/1981579850436751611
Moondream 3 can parse complex parking signs in one step. Prompt: “”extract sign details”” → JSON of each rule + transcription. No OCR stack, no regex: just vision that understands structure. ⚡️Fast, cheap, grounded vision AI. https://x.com/moondreamai/status/1980405287531254089
We’re updating olmOCR, our model for turning PDFs & scans into clean text with support for tables, equations, handwriting, & more. olmOCR 2 uses synthetic data + unit tests as verifiable rewards to reach state-of-the-art performance on challenging documents. 🧵 https://x.com/allen_ai/status/1981029159267659821
There’s been a crazy OCR mania for the last couple of days 👀 And you can 1-click deploy most of these models directly from the Inference Endpoints catalog 🔥 https://x.com/ErikKaum/status/1981750508982268330
one of the big motivations behind olmOCR 2’s use of RLVR with binary unit tests. the ability to easily define unit tests for model failures + retrain makes iteration really easy tech report out 👉 https://x.com/kylelostat/status/1981380820658180310
You might have seen a lot of OCR release recently… Here is another one, introducing 🦉 LightOnOCR-1B A fully end-to-end differentiable VLM model competing with all the latest releases while being much faster🚀 https://x.com/staghado/status/1981379888301867299
olmOCR – Open-Source OCR for Accurate Document Conversion https://olmocr.allen.ai/blog
Deploy your favorite OCR models with few-clicks directly from Hugging Face 🔥 📷we’ve added the latest bleeding edge OCR models to the Inference Endpoints catalog to make it easy for you to get started! links 👇 https://x.com/ErikKaum/status/1980965155145216336
there’s a new OlmOCR model that outperforms other OCR models, with Apache 2.0 license 🔥 and it costs only $178 to parse million pages 🤯”” / X https://x.com/mervenoyann/status/1981040748133826918
I’m excited to announce that Chandra OCR is open source! – Full layout information – Extracts and captions images and diagrams – Strong handwriting, form, table support – Works with transformers and vLLM https://x.com/VikParuchuri/status/1980667137606971423
we just updated the model comparison on our blog for you 🫡 added Chandra, OlmOCR-2, Qwen3-VL and their averaged OlmOCR score! https://x.com/mervenoyann/status/1981396054634615280
HF Datasets: built for audio, images, videos… And now, PDFs 📕 Still loadable in one line of code: >>> load_dataset(“”username/my_dataset””) What should we do next for OCR datasets ? 🤗 https://x.com/lhoestq/status/1981720383620358449




