Technical and Dev: AI News Week Ending 04/03/2026

Technical and Dev: AI News Week Ending 04/03/2026

April 3, 2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the exact square faceted perfume bottle composition with amber-gold liquid, crystal stopper, pure white background, soft shadow, and glass refractions. Replace the label text with ‘Tech’ in the same elegant black serif font. Add a delicate sterling silver chain draped naturally around the bottle neck with a small dainty pendant shaped like a miniature circuit board chip, rendered in high-fashion jewelry aesthetic–tiny, precise, refined like a Tiffany charm, not oversized or cartoonish.

🚀 Imagine running Claude 4.6 Opus-level reasoning… but entirely on your own GPU with just 16GB VRAM. This 27B Qwen3.5 variant, distilled on Claude 4.6 Opus reasoning traces, delivers frontier coding power locally. It’s beating Claude Sonnet 4.5 on SWE-bench in 4-bit
https://x.com/outsource_/status/2038999111039357302

This model has been #1 trending for 3 weeks now. It’s Qwen3.5-27B fine-tuned on distilled data from Claude-4.6-Opus (reasoning). Trained via Unsloth. Runs locally on 16GB in 4-bit or 32GB in 8-bit. Model:
https://x.com/UnslothAI/status/2038625148354679270

Very bullish on open source and local models Imagine running near-Opus-level model locally on that $600, 16GB Mac Mini you bought last month This 27B Qwen3.5 distill was trained on Claude 4.6 Opus reasoning traces and is putting up real numbers: – beats Claude Sonnet 4.5 on
https://x.com/TheCraigHewitt/status/2039303217620627604

METR time horizons are doubling every ~107 days Opus 4.6 reached 11.98 hours in February today we should be at around ~15.2h and by end of year ~87.4h 90% CI’s today April 3rd 2026: [11.64h, 21.88h] EOY: [53.13h, 164.19h]
https://x.com/scaling01/status/2040047917306876325

These functional emotions have real consequences. To build AI systems we can trust, we may need to think carefully about the psychology of the characters they enact, and ensure they remain stable in difficult situations. Read the full paper:
https://x.com/AnthropicAI/status/2039749660349239532

Anthropic on X: “We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human.
https://x.com/AnthropicAI/status/2039749632238944336

Emotion concepts and their function in a large language model \ Anthropic
https://www.anthropic.com/research/emotion-concepts-function

We’ve added Pareto frontier charts to the leaderboard. Now available across: Text, Vision, Search, Document, and Code Arena. The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens
https://x.com/arena/status/2039377186432618885

NEW paper from Google DeepMind The biggest threat to AI agents isn’t a smarter attacker. It’s the web itself. This work introduces the first systematic framework for understanding how the open web can be weaponized against autonomous agents. The paper defines “”AI Agent Traps””:
https://x.com/omarsar0/status/2039383554510217707

LLM Knowledge Bases Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating
https://x.com/karpathy/status/2039805659525644595

AI Infrastructure Roadmap: Five frontiers for 2026
https://nextbigteng.substack.com/p/ai-infrastructure-roadmap-five-frontiers-for-2026

https://substackcdn.com/image/fetch/$s_!fUTt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95a0b60f-1046-43cf-aea2-7474ae91f9e2_2220x1252.png

https://substackcdn.com/image/fetch/$s_!I-qQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9606cbd9-743a-43f6-b6b1-8373fca7f8d4_1600x928.png

The State of Consumer AI. Part 3: Time is Money
https://apoorv03.com/p/the-state-of-consumer-ai-part-3-time

AI models will secretly scheme to protect other AI models from being shut down, researchers find – Yahoo News Canada
https://ca.news.yahoo.com/ai-models-secretly-scheme-protect-162555909.html?guccounter=1

I have long felt that agent harnesses – even claude code – are too restrictive, because they are still designed by humans. New paper for Tinsghua and Shenzhen says, what if AI itself runs the harness, rather than defining it in code? Given a natural language SOP of how an agent
https://x.com/rronak_/status/2038401494177694074

Collinear presents YC-Bench This benchmark evaluates agent capability to run a simulated startup over a one-year horizon spanning hundreds of turns.
https://x.com/arankomatsuzaki/status/2039541189968626047

evals rhyme with training data the same rigor and care we put into data quality/curation for training should go into eval design training data updates the weights of our models, each example contributes a weight push in some direction to correctly classify that datapoint Evals
https://x.com/Vtrivedy10/status/2039029715533455860

I just published a blog that covers 30+ popular LLM evals / benchmarks and how they are created. Here are the common themes for success… For full details, find the blog post here:
https://t.co/sWSNkbCEhm (1) Domain Taxonomy. Most popular LLM benchmarks categorize their data
https://x.com/cwolferesearch/status/2039009111711367557

I really like the strategy used by CursorBench to evaluate Composer 2. Many good design decision: – Benchmark items are sourced from real coding sessions (from the Cursor team, so no issues with opt-in), which makes the evals realistic and less prone to contamination. – The
https://x.com/cwolferesearch/status/2037726856699420987

Introducing AA-AgentPerf – the hardware benchmark for the agent era. Key details: ➤ Real agent workloads, not synthetic queries: we’ve captured real coding agent trajectories where our agents used up to 200 turns and worked with sequence lengths >100K tokens ➤ Production
https://x.com/ArtificialAnlys/status/2037562417836929315

Introducing Contra Labs. The first frontier data and evaluation lab for Creative AI.
https://x.com/contraben/status/2039021014244262000?s=20

New conceptual guide: 🔄 The agent improvement loop starts with a trace Tracing is the foundational primitive for improving agents. A trace gives you the full behavioral record of what an agent actually did. From there, teams can enrich traces with evals and human feedback,
https://x.com/LangChain/status/2039028327030079565

Reasoning over Mathematical Objects Our 70-page(!) paper is out on arXiv, as covered by several of our recent blog posts. We study how to improve reasoning on hard tasks (e.g., math expressions) via: • better training data (& new evals) • better reward models (on-policy
https://x.com/jaseweston/status/2040062089725645039

Tau Bench got an update! Tau Bench is one of the most adopted Agentic Benchmarks. They now added “Banking” a fintech-inspired customer support domain built around a realistic knowledge base of 698 documents across 21 product categories. Tasks require agents to search this
https://x.com/_philschmid/status/2038655544613826985

The Agent Evaluation Readiness Checklist Starting to think through how to test your agents? We put together a step-by-step checklist for building, running, and shipping agent evals. 🧪 We walk through: → How to read traces in LangSmith and analyze errors, before building evals
https://x.com/LangChain/status/2037590936234959355

we’re leaning incredibly hard into Open Models + Open Harnesses evals show that current open models get near frontier (or better) intelligence on many tasks, they’re way cheaper, and usually faster real world tasks need to take perf, cost, latency into account many tasks don’t
https://x.com/Vtrivedy10/status/2039805753905840159

we’re leaning into the future of Agent Improvement with Traces, Evals, & Infra the future will be deeply grounded in data so that we can win against slop that means we’ll need to: – point smart agentic compute towards traces to surface and monitor errors – use human & agent
https://x.com/Vtrivedy10/status/2039035899938267334

Weekend over. Here’s what I built:
https://t.co/me1qexYWgw A simple agent-native CLI to parse, sanitise, and commit agent traces to public or private Hugging Face datasets for analytics, evals, and training. What I focused on: – a schema that is actually useful for downstream
https://x.com/jayfarei/status/2038385591818023278

Excited about our new paper: AI Agent Traps AI agents inherit every vulnerability of the LLMs they’re built on – but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself. The web pages, emails, APIs, and
https://x.com/FranklinMatija/status/2039001719007330530

NEW papers on self-organizing LLM Agents. Assign an agent a role, and it’ll follow instructions. Let agents figure out roles themselves, and they’ll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing
https://x.com/dair_ai/status/2039350842382512455

NEW research from CMU. (bookmark this one) The biggest unlock in coding agents is understanding strategies for how to run them asynchronously. Simply giving a single agent more iterations helps, but does not scale well. And multi-agent research shows that coordination >
https://x.com/omarsar0/status/2038627572108743001

This work from @voooooogel was pretty ground-breaking:
https://x.com/jeremyphoward/status/2039880485036544422

Apple Research just published something really interesting about post-training of coding models. You don’t need a better teacher. You don’t need a verifier. You don’t need RL. A model can just… train on its own outputs. And get dramatically better. Simple Self-Distillation
https://x.com/BoWang87/status/2039943931543331237

Why it’s getting harder to measure AI performance
https://www.understandingai.org/p/why-its-getting-harder-to-measure

World Reasoning Arena – A comprehensive benchmark for evaluating world model – Expose a substantial gap between current models and human-level hypothetical reasoning
https://x.com/arankomatsuzaki/status/2038443186255991169

GLM-5V-Turbo is now live in Vision Arena. Test its ability to reason over visual inputs using your real-world prompts. Don’t forget to vote so we can see how it stacks up.
https://x.com/arena/status/2039400189178556814

Are open source models catching up to proprietary models? We’ve looked back at 3 years of Arena’s data to show how the race has evolved. For comparison, we’ve taken the top 20% of the models and uncovered the following: – Before mid 2024: The gap was between 100-150 points – In
https://x.com/arena/status/2037584085997216100

Today we drop Trinity-Large-Thinking. SOTA on Tau2-Airline, frontier-class on Tau2-Telecom, and the #2 model on PinchBench, right behind Opus. On BCFLv4, we’re in the mix with the best. 26 people with under $50M raised and a ruthless pursuit of greatness. What this team just
https://x.com/MarkMcQuade/status/2039375842560872834

One way to see the advancement of AI is to see how much further you can get with new models on the same hardware Here is “”an otter using a laptop on an airplane”” generated on my home computer using the open weights Wan 2.1, first try. We have come pretty far in 18 months.
https://x.com/emollick/status/2037616578787713194

Today we announce a new evaluation framework to improve AI benchmark reproducibility. By optimizing the ratio of the number of items to human raters per item, we can better capture the nuance of human disagreement in subjective tasks. Learn more:
https://x.com/GoogleResearch/status/2039014600927043926

Gemma 4 31B shifts the Pareto frontier, scoring +30 Arena points above similarly priced models like DeepSeek 3.2. Its position on the Pareto frontier is based on early pricing indicators from third parties.
https://x.com/arena/status/2040128319719670101

impressive, very nice. now let’s compare a 31b dense to a 31b active 670b total instead. flop for flop
https://x.com/stochasticchasm/status/2039912148676264334

Almost everyone is talking about @GoogleResearch’s TurboQuant (and for good reason) ➡️ It lets you run a 3-bit system with the accuracy of a full-precision model. Technically, TurboQuant is a compression algorithm that shrinks high‑dimensional vectors to low precision without
https://x.com/TheTuringPost/status/2037182800466698718

We need to publicly clarify serious issues in Google’s ICLR 2026 paper TurboQuant. TurboQuant misrepresents RaBitQ in three ways: 1. Avoids acknowledging key methodological similarity (JL transform) 2. Calls our theory “suboptimal” with no evidence 3. Reports results under
https://x.com/gaoj0017/status/2037552350924042488

GEditBench v2 A Human-Aligned Benchmark for General Image Editing paper:
https://x.com/_akhaliq/status/2039007111741366620

Gen-Searcher Reinforcing Agentic Search for Image Generation paper:
https://x.com/_akhaliq/status/2039000804061847801

Today we’re releasing Trinity-Large-Thinking. Available now on the Arcee API, with open weights on Hugging Face under Apache 2.0. We built it for developers and enterprises that want models they can inspect, post-train, host, distill, and own.
https://x.com/arcee_ai/status/2039369121591120030

[2603.24477] Composer 2 Technical Report
https://arxiv.org/abs/2603.24477

@aryaman2020 @jatin_n0 @voooooogel if it makes you feel better: i also introduced the idea of generating useful steering vectors from contrastive synthetic data in my 2016 paper – a whole section on augmenting inputs with low pass gaussian filter to derive a steering vector that produces less blurry samples.
https://x.com/dribnet/status/2039775902368948363

$5.5m to $73k per year (!!) by: 1) decomposing business logic 2) modeling intent using DSPy 3) optimizing a smaller model to improve cost profile *while maintaining performance* Why wouldn’t this be the default pattern for folks embedding AI into their pipelines?
https://x.com/kmad/status/2038659241238503716

14 most important and influential types of JEPA ▪️ JEPA / H-JEPA ▪️ I-JEPA ▪️ MC-JEPA ▪️ V-JEPA ▪️ Audio-JEPA ▪️ Point-JEPA ▪️ 3D-JEPA ▪️ ACT-JEPA ▪️ V-JEPA 2 ▪️ LeJEPA ▪️ Causal-JEPA ▪️ V-JEPA 2.1 ▪️ LeWorldModel ▪️ ThinkJEPA Save the list and check this out to explore these
https://x.com/TheTuringPost/status/2038222542243238399

2 methods that help Transformers to retrieve from depth (layers): ▪️ Attention Residuals (AttnRes) – makes the residual stream depth-aware, letting each layer use information from multiple earlier layers. ▪️ Mixture-of-Depths Attention (MoDA) – makes the attention heads
https://x.com/TheTuringPost/status/2037817485110772166

28T tokens for a 350M model!?
https://x.com/abacaj/status/2039158882111521190

anemll-flash-mlx repo is up! Simple toolkit to speed up Flash-MoE experiments on MLX. Let MLX do what it does best – dense inference in memory. We only optimize the MoE part: stable slot-bank + SSD streaming, clean hit/miss separation, no per-token expert materialization.
https://x.com/anemll/status/2038684375425200360

At our last DSPy meetup, @kshetrajna shared this amazing case study about how he’s using DSPy at @Shopify scale. I think this was my favorite slide.
https://x.com/dbreunig/status/2038650860843245814

Aurora: It’s an open-source, RL-based framework that learns directly from live inference traces and continuously updates the speculator without interrupting serving.
https://www.together.ai/blog/aurora

Axolotl v0.16.0 is here. Two big pillars in this release: 1. MoE & LoRA — making MoE fine-tuning fast (15x faster, 40x less memory) and LoRA training seamless, across architectures, out of the box 2. GRPO — async training (58% faster), custom Triton kernels, environment
https://x.com/winglian/status/2039739597287047384

Block – From Hierarchy to Intelligence
https://block.xyz/inside/from-hierarchy-to-intelligence

Code is open-sourced and we welcome contributions! Blog:
https://t.co/0p9HC3M6oa Paper:
https://t.co/fvLuHrqDbX Code:
https://x.com/togethercompute/status/2039099854702669835

Fujitsu One Compression
https://fujitsuresearch.github.io/OneCompression/

Great explainer by @AfterQuery of the choices that go into SFT->RL training: picking checkpoints, selecting the right RL tasks, designing informative rewards, and measuring performance. The outcome: a 5x score improvement on a 20B model!
https://x.com/tinkerapi/status/2039049192451301761

Great to see @anyscalecompute ship DP group fault tolerance in Ray Serve LLM for vLLM WideEP deployments — failed DP groups are isolated and rebuilt atomically while healthy groups keep serving. This complements vLLM’s Elastic EP at the engine level. Orchestration + engine, two
https://x.com/vllm_project/status/2039870472092049458

HandX Scaling Bimanual Motion and Interaction Generation paper:
https://x.com/_akhaliq/status/2039029830323253546

HeavyBall 3.0.0 is finally out. Key features: * FSDP * DDP * End-to-End Compilation (2.5x speedup) * Higher-precision PSGDKron (grey, vs. HB2’s blue) * Faster Muon and SOAP * PSGD-PRO (yellow) * LATHER, a SOAP-like optimizer * HyperBall * explicit `consume_grad` * simplified
https://x.com/Clashluke/status/2039374459375677814

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end
https://x.com/yoonholeee/status/2038640635482456118

How we optimized Dash’s relevance judge with DSPy – Dropbox
https://dropbox.tech/machine-learning/optimizing-dropbox-dash-relevance-judge-with-dspy

If you want to understand the latest landscape of RL training and frameworks this blog by @DirhousssiAmine and the @huggingface team compares 16 different RL frameworks from VeRL, SLIME , TRL to many more across the following aspects. > Orchestration & Concurrency Primitive:
https://x.com/adithya_s_k/status/2039406523076767821

In my opinion, the best thing about CuTeDSL kernels is how you don’t really need to do opaque CuTe layout gymnastics at all. Instead, you can just inline any kind of PTX like cpasync, ldmatrix (x2/x4), mma.sync and so on purely in Python. I used to pray for times like this.
https://x.com/maharshii/status/2039379662066131296

Introducing 🤗 Transformers.js v4: state-of-the-art machine learning for the web! 🚀 New WebGPU backend (browser, Node.js, Bun, Deno) ⚡️ Huge performance improvements 🤯 Support for over 200 architectures 🛠️ Complete codebase refactor Learn more about our biggest release yet! 👇
https://x.com/xenovacom/status/2038610331417608691

It’s my favorite kind of work: linear algebra insight + fast kernels. When playing w Muon a while ago, we were thinking why not speed it up by operating on the small square matrix X X^T instead of the large rectangular matrix X. Jack, Noah, and Berlin spent many months
https://x.com/tri_dao/status/2038666307738964466

Just finished reading an excellent paper from @stanfordnlp: “Sycophantic AI decreases prosocial intentions and promotes dependence.” What feels like helpful validation from AI can quietly reduce accountability and prosocial behavior, while fostering dependence. The result is
https://x.com/Zulfikar_Ramzan/status/2038408402809090554

Lambda’s @mlcommons #MLPerf Inference v6.0 results are out. Three findings worth knowing before your next deployment:
https://x.com/LambdaAPI/status/2039365318276268173

LFM2.5-350M: No Size Left Behind | Liquid AI
https://www.liquid.ai/blog/lfm2-5-350m-no-size-left-behind

New from Together Research: Aurora. Speculative decoding that adapts to shifting traffic in real time — and keeps improving the longer it runs. Open-source, RL-based, 1.25x faster vs. a well-trained static speculator with no offline retraining pipeline. Thread 🧵
https://x.com/togethercompute/status/2039099845856903644

new: multiple vector columns store multiple embeddings for the same document – each with its own dimensions, types, and ANN index multimedia → multiple vectors docs:
https://x.com/turbopuffer/status/2039734876954632428

Of course, this is a trivial example of prompt-based speculative decoding, because the model recites sections from what is already in the prompt (so don’t get too excited 😉). Still it’s a nice and quick showcase of some of llama.cpp capabilities
https://x.com/ggerganov/status/2039753496317059270

Oh Memories, Where’d You Go | Weaviate
https://weaviate.io/blog/engram-internal-use-case

Okay LLM + PyTorch people, trunc_normal_, what the fuck! Many LLM inits use it w/ default cutoffs. It’s either not doing anything or it’s quite broken due 2 issues. 1. The a/b cutoffs in PyTorch are not in std-devs, they are absolute. So w/ a std=0.02, and -2/2 (default arg)
https://x.com/wightmanr/status/2038634643843682366

overwhelming evidence for late interaction / multi-vector models yet again 🙂 > even after finetuning, single-vector models lag far behind multi-vector embeddings, which achieve significant performance gains and exhibit greater robustness to catastrophic forgetting.
https://x.com/lateinteraction/status/2039272441654993082

Predicting When RL Training Breaks Chain-of-Thought Monitorability — LessWrong
https://www.lesswrong.com/posts/SvxaKP5KdkksZPcG7/predicting-when-rl-training-breaks-chain-of-thought

Pretraining is data-inefficient. This is entirely a consequence of the fact that we throw away the KV cache after every forward-backward step! If we can integrate efficient KV cache compaction into pretraining, we will unlock human level data efficiency. Neural KV cache
https://x.com/part_harry_/status/2039400872871068041

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs
https://prismml.com/news/bonsai-8b

Six years into making and finally TRL is at v1.0! Started as a small side project to learn more about transformers+RL and is now downloaded 100k times everyday.
https://x.com/lvwerra/status/2039003207985197107

SkyPilot now works natively with @VAST_Data storage 🤝 Training runs often start with a dead period – copying data while GPUs sit idle. With VAST + SkyPilot, you can skip that entirely. • Mount petabyte-scale data directly. No staging, no prefetch pipelines, no idling •
https://x.com/skypilot_org/status/2039372218031845769

Small AI models and specialized vertical AI models are very brittle. Any unusual situation or out-of-distribution issue and they break down. You also won’t get emergent leaps or good problem solving. They still have uses, but benchmarks don’t do a good job of showing weaknesses
https://x.com/emollick/status/2037235669278367931

So what does a VP of Kernels do all day? Check out this behind the scenes look:
https://x.com/realDanFu/status/2039414710203015177

TAPS Task Aware Proposal Distributions for Speculative Sampling paper:
https://x.com/_akhaliq/status/2038998550881714402

the harness matters models are great but the harness shapes them to be good (and cost efficient) for work we care about
https://x.com/Vtrivedy10/status/2038993396463796638

The headline finding — online training from scratch surpasses a carefully pretrained static baseline: → Aurora: 3.08 accepted length, 302.3 tok/s → Static pretrained: 2.63 → Pretrained + finetuned: 2.99 Offline pretraining is not a prerequisite for effective speculative
https://x.com/togethercompute/status/2039099852924367186

The outcome of the approach is that kernels get far simpler and complex race conditions are defined away. This allows more powerful algorithms to compose, along with much more intra-kernel portability than ever before. Watch out megakernels!
https://x.com/clattner_llvm/status/2039028017843126406

The Together AI kernels team pushes performance to the next level. An investigation into how left more questions than answers, but VP of Kernels @realDanFu seemed proud. If you want the full picture, read on:
https://x.com/togethercompute/status/2039413297343332635

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them.
https://x.com/gaoj0017/status/2037532673812443214

they have kairos, dreaming, ultrathink, ultraplan, ultrareview and complete integration with github and slack
https://x.com/scaling01/status/2039001738934468857

this is great, am writing a similar blog which is even more important why there are open harnesses – memory cannot be stuck in a proprietary harness or behind a proprietary api
https://x.com/hwchase17/status/2040134178864546159

What if LLMs could remember as humans do? LLM memory is either perfect and lossless or ultra-compressed. What does a slightly compressed working memory to extend its context window look like? Our researchers built a 7M-parameter perceiver that compresses KV caches 8x while
https://x.com/baseten/status/2039389931328704905

When and how can test-time thinking allow models to use information latent in their training data? What are the benefits and tradeoffs relative to other solutions like synthetic data? Pleased to share (after a long delay) an exploration of these issues:
https://x.com/AndrewLampinen/status/2040157250686484638

Trinity-Large-Thinking achieves state of the art results on Tau2 airline, and is at frontier level on Tau2 telecom. It’s also the #2 model on PinchBench, just behind Opus 4.6, and we’re among the giants on BCFLv4
https://x.com/latkins/status/2039370549743243353

Deep transformers used to accumulate layer history. Now they are starting to retrieve from it. → @Kimi_Moonshot proposed Attention Residuals (AttnRes), driving this shift. They turn the residual stream into an attention problem. Why do we need it? Depth in Transformers mostly
https://x.com/TheTuringPost/status/2037107923109953788

Want to talk to the past? Here is an LLM “”trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library.”” Quite different from an LLM roleplaying a Victorian.
https://x.com/emollick/status/2038084424810537215

Voxtral TTS paper is out! it’s a good read 🙂
https://x.com/qtnx_/status/2037553397423902846

The fact that every scientific paper in 2026 is still uploaded only as fully formatted PDFs to academic archive sites that often limit downloads tells you everything you need to know about how quickly the scientific system is adjusting to the potential of AI to accelerate science
https://x.com/emollick/status/2038820178264293482

Paper review: LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
https://t.co/2dD7hPIURL Nice clean github:
https://t.co/YZ4e1eUACi This is the application of the LeJEPA results to world models, trained offline on experience from three different
https://x.com/ID_AA_Carmack/status/2039046172799578122

A sign that human creativity is a bottleneck is that this year everyone can generate almost any image or video they can think of for nearly free and the April Fools posts are basically just as bad as any other year.
https://x.com/emollick/status/2039379053480914959