Technical and Dev: AI News Week Ending 01/23/2026

Technical and Dev: AI News Week Ending 01/23/2026

January 23, 2026

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Top-down isometric pixel art scene of a data center as a city grid in PS1 GTA style, chunky server rack sprites arranged like buildings, glowing network cables as neon streets, tiny technician sprites moving between racks, data packets flowing like vehicles, saturated blues and electric greens against dark floor, CRT glow effect, 32-bit aesthetic with visible pixels and dithered textures.

AI agents have gotten good enough at long horizon tasks that it is an inflection point in the impact of AI at work. Agreement on this from METR, GDPval & now Anthropic. If you have a tool that saves 8 hours 65% of the time, that changes work, even counting potential error rates.”” https://x.com/emollick/status/2012237630411292859

Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026 https://www.gartner.com/en/newsroom/press-releases/2026-1-15-gartner-says-worldwide-ai-spending-will-total-2-point-5-trillion-dollars-in-2026

AI data centers can now use as much power as New York State uses on the hottest days of the years. We find that data centers currently have a total capacity of around 30 GW.”” https://x.com/EpochAIResearch/status/2012303496465498490

Benchmarking AI Agent Memory: Is a Filesystem All You Need? | Letta https://www.letta.com/blog/benchmarking-ai-agent-memory

Does anyone know of a “”intro to filesystems for smart non-computer people who never had to use a terminal or even really folders”” that I could give to students who grew up largely not having to know this stuff? Everything 101 I can find is too easy & doesn’t explain concepts.”” https://x.com/emollick/status/2013094876389187949

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies — AVERI https://www.averi.org/ourwork/frontier-ai-auditing

Inference startup Inferact lands $150M to commercialize vLLM | TechCrunch https://techcrunch.com/2026/01/22/inference-startup-inferact-lands-150m-to-commercialize-vllm/

Only vibes”” is basically the definition of expert. (Usually its called unconscious heuristics, but same thing)”” https://x.com/emollick/status/2012358857683554734

What if your bricklayer could fly? Researchers at Imperial College London tested a new way to build using cooperating drones instead of cranes or scaffolding. Two drones work together. One lays down foam and lightweight cement. The other checks accuracy while humans supervise”” https://x.com/IlirAliu_/status/2012963986069745862

How well did forecasters predict 2025 AI progress? According to the @aidigest_’s survey, forecasters: – Mostly nailed benchmark scores – Underestimated risks from AI-enabled bioweapons – Underestimated revenue by almost 2× – Overestimated public concern about AI Details in 🧵”” https://x.com/EpochAIResearch/status/2012264230028984493

Thoughts on Evals – Raindrop Blog https://www.raindrop.ai/blog/thoughts-on-evals

Without Benchmarking LLMs, You’re Likely Overpaying 5-10x | Karl Lorey https://karllorey.com/posts/without-benchmarking-llms-youre-overpaying

Community Benchmarks: Evaluating modern AI on Kaggle https://blog.google/innovation-and-ai/technology/developers-tools/kaggle-community-benchmarks/

Since OpenAI didn’t update Figure 7 from GDPval given the success rate of GPT-5.2 on long-form tasks, I used GPT-5.2 Pro to do so. The chart assumes the process is: delegate long tasks to AI, evaluate the output for an hour, then decide to try again or give up & do it yourself.”” https://x.com/emollick/status/2013243362229256550

Small models, big results: Achieving superior intent extraction through decomposition https://research.google/blog/small-models-big-results-achieving-superior-intent-extraction-through-decomposition/

Differential Transformer V2 https://huggingface.co/blog/microsoft/diff-attn-v2

📈👨🏻‍💻”” https://x.com/alexandr_wang/status/2013403027655532672

.@CarnegieMellon and @AIatMeta made something extraordinary with Transformers that everyone is now talking about. ➡️ STEM (Scaling Transformers with Embedding Modules) scales a Transformer’s parametric memory without routing, instability, and extra compute: • it removes ~1/3″” https://x.com/TheTuringPost/status/2013011864880660495

Today’s models generally force context into a rigid 1-2-3 order. @SakanaAILabs introduced something more realistic – a RePo (context re-positioning) mechanism. RePo learns positions based on context structure, capturing how pieces of information are actually related. This”” https://x.com/TheTuringPost/status/2012400184219881840

Introducing OptiMind, a research model designed for optimization https://huggingface.co/blog/microsoft/optimind

[2601.08763] Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs https://arxiv.org/abs/2601.08763

[2601.14750] Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning https://arxiv.org/abs/2601.14750

4 must-read research of the week: ▪️ Stem: Scaling transformers with embedding modules ▪️ Reasoning models generate societies of thought ▪️ Multiplex thinking: Reasoning via token-wise branch-and-merge ▪️ The Assistant axis: Situating and stabilizing the default persona of”” https://x.com/TheTuringPost/status/2013756588348317896

A free comprehensive textbook – Linear Algebra for Computer Vision, Robotics, and Machine Learning Covers both, essential theory and computation: – vector spaces, matrices, norms – eigenvalues, SVD, numerical algorithms – applications like PCA, graphs, wavelets, and 3D”” https://x.com/TheTuringPost/status/2012674673423876602

Having used a lot of LLMs, it is clear that there was a real advantage to being a well-regarded source on the internet in the early 2020s, an opportunity to get AIs to respect you that may never come again. When they can, models love using Our World in Data, CEPR, NBER, etc.”” https://x.com/emollick/status/2013822074205139410

I don’t really understand how so much careful work going into a study and then it uses non-frontier LLMs with no better models as comparisons. At this point, we know the trend line, if a weaker model is close, it is likely a good model can pull it off. We don’t learn much from it”” https://x.com/emollick/status/2011973504531358189

Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang | LMSYS Org https://lmsys.org/blog/2026-01-21-novita-glm4/

Pass@k is Mostly Bunk – Marc’s Blog https://brooker.co.za/blog/2026/01/21/pass-k.html

Pipeline Parallelism in SGLang: Scaling to Million-Token Contexts and Beyond | LMSYS Org https://lmsys.org/blog/2026-01-15-chunked-pipeline/

Slonk: Slurm on Kubernetes for ML Research at Character.ai https://blog.character.ai/slonk/

There are many ideas on how to build better world models, and one of the recent ones comes from @Princeton with Web World Models (WWMs). ▪️ The key principle: separate rules from imagination. WWM builds around these two pieces: 1. The physical layer is handled by code. It’s”” https://x.com/TheTuringPost/status/2013016473514717330

Actually the majority of studies have so far found that AI reduces inequality and closes skill gaps There are relatively few that found the opposite–that experts got even better. One was compelling enough to build a narrative that AI would increase skill gap. But it wasn’t real”” https://x.com/alexolegimas/status/2012334799998861451

The Commoditization of Services – by Carl Cortright https://blog.excel.holdings/p/the-commoditization-of-services

Yes, that is the whole reason that fraudulent paper was such a big deal to those who study AI. Most other research (including our academic papers studying Boston Consulting Group using AI) found that AI reduced performance gaps. That may change as models get better, but not yet.”” https://x.com/emollick/status/2012336006989557788

Our 2025 Impact Report is out. The AI industry is scaling exponentially – investment, compute, data center buildouts. So, it turns out, is demand for making sense of it all. See how we’ve kept up!”” https://x.com/EpochAIResearch/status/2012226390461132824

Korea Kicks Off AI Squid Game for Best Sovereign Foundation Models – Bloomberg https://www.bloomberg.com/news/features/2026-01-19/korea-kicks-off-ai-squid-game-for-best-sovereign-foundation-models

So many notable model releases this week: Open-source: ▪️ Molmo2 ▪️ Ministral 3 ▪️ TranslateGemma ▪️ STEP3-VL-10B ▪️ HeartMuLa Closed models: ▪️ OpenDecoder ▪️ UM-Text ▪️ Solar Open More about them in my newsletter: https://x.com/TheTuringPost/status/2014063765600477695

RePo: Language Models with Context Re-Positioning https://pub.sakana.ai/repo/