Ethan B. Holland

Over 51,300 manually organized AI links and counting

Benchmarks: AI News Week Ending 01/09/2026

January 9, 2026

Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Minimalist Anthropic-style editorial illustration of a thick black measuring tape curved into a percentage symbol, hand-drawn wobbly line art, visible measurement markings on tape, centered on warm off-white background with subtle paper grain texture, flat vector shapes with rounded corners, high contrast thick strokes, lots of negative space, 16:9 landscape with vertical split composition

The many eulogies for AI capability growth after the release of GPT-5 seem especially short-sighted right now. Letting people nervous about AI feel they can safely ignore AI development because it was pure hype that would never have any real impact is not a good thing for anyone”” https://x.com/emollick/status/2018141446683930811

AI leaderboard maker LMArena hits $1.7 billion valuation – Sherwood News https://sherwood.news/tech/ai-leaderboard-maker-lmarena-hits-usd1-7-billion-valuation/

Fueling the World’s Most Trusted AI Evaluation Platform https://arena.ai/blog/series-a/

LMArena has raised $150M+ at a valuation of $1.7B+ 💪🏼 In the past 7 months, @arena has: Grown our userbase 25x. 35M+ unique users. Grown our revenue from 0 to >>$30M+ ARR in 4 months. Our products help labs and enterprises measure the real utility of AI and understand their”” https://x.com/ml_angelopoulos/status/2008577473450250441

The industry is shifting from asking “What can this model do?” to “Can I trust it?” LMArena’s $150M raise underscores the growing need for independent, transparent, real-world evaluation frameworks that ensure AI systems meet the rigorous reliability and trust requirements of”” https://x.com/istoica05/status/2008575786169889132

Today, we’re excited to announce our $150M Series A at a $1.7B valuation–nearly 3× our May seed round. Since launching evaluations in Sept, our annualized consumption run rate has surpassed $30M. Our mission is clear: to measure and advance the frontier of AI for real-world use,”” https://x.com/arena/status/2008571061961703490

8 plots that explain the state of open models https://www.interconnects.ai/p/8-plots-that-explain-the-state-of

AA-Omniscience is our benchmark measuring factual recall and hallucination across 6,000 questions covering 42 economically relevant topics within 6 domains (Business, Health, Law, Software Engineering, Humanities & Social Sciences, and Science/Engineering/Mathematics). Models”” https://x.com/ArtificialAnlys/status/2008570655047118914

New year, new Artificial Analysis Intelligence Index! Announcing Intelligence Index v4.0: incorporating 3 new evaluations, further aligning to real-word use and reducing saturation The Artificial Analysis Intelligence Index is our synthesis metric for assessing generalist model”” https://x.com/ArtificialAnlys/status/2008570646897573931

Shovel-ready short investigations seek funding! – How is AI’s adoption varying across roles, sectors, & regions? – Trends & bottlenecks for data supply? (incl. synthetic and RL environments) – Forecast for worldwide compute buildout? – Will inference costs continue falling? 🧵”” https://x.com/EpochAIResearch/status/2018767169094566208

Brendan Foody on Teaching AI and the Future of Knowledge Work (Ep. 267) | Conversations with Tyler https://conversationswithtyler.com/episodes/brendan-foody/

LLMs as Judges: Measuring Bias, Hinting Effects, and Tier Preferences https://aashidutt.substack.com/p/llms-as-judges-measuring-bias-hinting?triedRedirect=true

Lord of War, meet Lord of Tokens: Torture-testing image models on design-agency grade work | KAY SINGH https://singhkays.com/blog/lord-war-test-image-models/

State of AI Data Connectivity Report: 2026 Outlook – CData Software https://www.cdata.com/lp/ai-data-connectivity-report-2026/