Technical and Dev: AI News Week Ending 03/20/2026

Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Using the provided reference image, preserve the deep midnight navy car hood, chrome pedestal base, shallow depth-of-field sky background, and dramatic upward camera angle exactly as shown. Replace only the Mercedes star ornament with a single polished chrome microchip CPU standing upright on its edge, mounted on the same pedestal at realistic hood ornament scale, its rectangular form and pin grid pattern clearly visible and catching specular highlights like jewelry. Add bold white sans-serif text reading TECH across the upper portion of the image in the style of a luxury automotive advertisement headline.

BREAKING 🚨: MiniMax released MiniMax M2.7, a new self-evolving model, achieving a score of 56.22% on SWE-Bench Pro. M2.7 was used for building complex agent harnesses during its own development. Users can now access MiniMax M2.7 via APIs and MiniMax Agent.
https://x.com/testingcatalog/status/2034250919345377604#m

During the iteration process, we also realized that the model’s ability to recursively evolve its harness is equally critical. Our internal harness autonomously collects feedback, builds evaluation sets for internal tasks, and based on this continuously iterates on its own
https://x.com/MiniMax_AI/status/2034315323109953605#m

Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 – Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents
https://x.com/MiniMax_AI/status/2034315320337522881#m

MiniMax Global Announces Full Year 2025 Financial Results – MiniMax News | MiniMax https://www.minimax.io/news/minimax-global-announces-full-year-2025-financial-results

Minimax M2.7 released! And its a big one Highlights: Self-evolving – first model that helped build itself, running 100+ autonomous optimization loops during its own RL training (30% internal improvement). Strong coder – 56.2% on SWE-Pro (near Opus 4.6), 55.6% on VIBE-Pro,
https://x.com/kimmonismus/status/2034269026353082422#m

MiniMax M2.7: Early Echoes of Self-Evolution – MiniMax News | MiniMax https://www.minimax.io/news/minimax-m27-en

Ramp AI Index March 2026 update https://ramp.com/velocity/ai-index-march-2026

@_avichawla Impressive work from Kimi
https://x.com/elonmusk/status/2033528245464047805

🔥 @Kimi_Moonshot’s new Attention Residual paper is sparking discussions. Zhihu contributor OpenLLMAI shares a deep dive: “”From Kimi’s Attention Residual to ‘Vertical Attention’ — an idea I’ve been thinking about for half a year.”” Some interesting thoughts on attention mechanisms
https://x.com/ZhihuFrontier/status/2033751367198949865

Avi Chawla on X: “Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection. If you consider this across 40+ layers, https://t.co/5i5AN9tzIm” / X
https://x.com/_avichawla/status/2033472650836914495

https://chatgpt.com/share/69cda240-9324-832a-89b6-a43d4a22f437

https://claude.ai/share/7239e73e-9e9d-469a-bbdb-e5c7da75a4e9

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with
https://x.com/Kimi_Moonshot/status/2033378587878072424

visual summary of attention residuals by kimi, beautiful paper
https://x.com/eliebakouch/status/2033488233854620007

5.3 to 5.4 is what i would have expected to warrant a jump to GPT-6
https://x.com/yacineMTB/status/2033291560217923803

A knowledge-work platform built around GPT-5.4 Pro level intelligence would be really useful. The gap between other models and what Pro can do on complex intellectual work remains stark. I would love to have access in a Codex-like platform with shared file spaces, subagents, etc
https://x.com/emollick/status/2033959257196966360

GPT-5.4 mini matters for subagents because it changes what feels worth handing off. The parent thread should hold the architecture, plan, and progress narrative. Fast subagents can explore the repo, check hypotheses, and preserve the parent thread’s limited attention.
https://x.com/nickbaumann_/status/2034134875234832540#m

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%… … and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics
https://x.com/antoine_chaffin/status/2034649565614272925

“a large jump in agentic” – we agree 🙌 M2.7 is a big step forward in agentic workflows, from tool use to real-world, multi-step execution. Now live on @OpenRouter 🚀
https://x.com/MiniMax_AI/status/2034356786413867182#m

🔍Follow Zhihu contributor toyama nao, a top large model reviewer, to evaluate @MiniMax_AI MiniMax-M2.7’s capabilities in detail!✨ 📌 Basic Info： MiniMax iterates monthly in the Agent-driven model track. As a minor version upgrade, M2.7 carries its new understanding of the
https://x.com/ZhihuFrontier/status/2034543142234628318

DEFAULT and FREE M2.7 on @zocomputer
https://x.com/MiniMax_AI/status/2034348503347171625#m

Early testers are saying that M2.7 has big improvements in emotional intelligence and character consistency 👀
https://x.com/MiniMax_AI/status/2034528945962696948

Great to see M2.7 live on @vercel_dev 🙌 We’re seeing a real shift from simple tool use → multi-step agentic workflows running in production. M2.7 is built for exactly that.
https://x.com/MiniMax_AI/status/2034357583797178841#m

Live Stream Alert with @OpenClaw Thursday 9PM ET We will share an in-depth look at MiniMax M2.7, including early developments in self-evolution and efficient solutions designed to support 100,000 OpenClaw running clusters. 🎁 MiniMax vouchers will also be distributed during
https://x.com/MiniMax_AI/status/2034520321466978488

M2.7 is already up😎 Try it on @kilocode.
https://x.com/MiniMax_AI/status/2034339731660759097#m

M2.7 now live on @yupp_ai 🌸 Feels like a good time to build something new.
https://x.com/MiniMax_AI/status/2034328337527783857#m

M2.7 now on @opencode ⚙️ give it a plan → it runs with it add the loop (check → fix → retry) and things start to feel very agentic
https://x.com/MiniMax_AI/status/2034361282527461473#m

Minimax 2.7 incoming!
https://x.com/kimmonismus/status/2033531736647463151

Minimax 2.7 is available in Hermes Agent through the Minimax Provider, try it today!
https://x.com/Teknium/status/2034658808870621274

MiniMax doubles in Hong Kong debut, marking yet another Chinese AI listing https://www.cnbc.com/2026/01/09/minimax-hong-kong-ipo-ai-tigers-zhipu.html

MiniMax has released MiniMax-M2.7, delivering GLM-5-level intelligence for less than one third of the cost MiniMax-M2.7 from @MiniMax_AI scores 50 on the Artificial Analysis Intelligence Index, an 8-point improvement over MiniMax-M2.5, which was released one month ago. This is
https://x.com/ArtificialAnlys/status/2034313314420019462#m

MiniMax launches M2.7 model on MiniMax Agent and APIs https://www.testingcatalog.com/minimax-launches-m2-7-model-on-minimax-agent-and-apis/

MiniMax M2.7 now live on @Trae_ai Excited to see what you ship. 🙌
https://x.com/MiniMax_AI/status/2034327432124350924#m

MiniMax M2.7: Early Echoes of Self-Evolution
https://x.com/MiniMax_AI/status/2034335605145182659

MiniMax M2.7🆚MiniMax M2.5 – Website about recently released video games The release of M2.7 should be close. MiniMax M2.5 was released two days after it appeared on the Arena
https://x.com/AiBattle_/status/2033503838284447758

MiniMax-M2.7 is now available on Ollama’s cloud. made for coding and agentic tasks 🖥️ Try it inside Claude Code: ollama launch claude –model minimax-m2.7:cloud 🦞 Use it with OpenClaw: ollama launch openclaw –model minimax-m2.7:cloud If you already have OpenClaw
https://x.com/ollama/status/2034351916097106424#m

We’re now going to start having these really awesome benchmarks where as you get better on the benchmark, you’re not just re-solving an exam question- you’re solving something no one has solved before you and making the world a better place.
https://x.com/OfirPress/status/2034298283774877926#m

ByteDance also implemented attention over depth. They literally combined it with sequence attention.
https://x.com/rosinality/status/2033810580604158323

Help us measure the progress towards AGI (specifically cognitive capabilities) by building benchmarks on @kaggle, with $ 200K in prizes available! Details in 🧵
https://x.com/OfficialLoganK/status/2033978254344786351

lmao > google cooks paper, “”meh its probably not gonna work, pass”” > chinese lab cooks exact same thing one year later, everyone gets super hyped EVERY SINGLE TIME
https://x.com/cloneofsimo/status/2033586628770570323

📎We’ve uploaded it to arXiv, enjoy! https://x.com/Kimi_Moonshot/status/2033796781327454686

🔥 An insider take on @Kimi_Moonshot ‘s Attention Residual — From Kimi AI infra team member & Zhihu contributor Reku A rare look at how attention ideas collide with real-world training systems 👇 🧠 Attention Residual isn’t just modeling — it’s an infra challenge I mainly worked
https://x.com/ZhihuFrontier/status/2034269774281400798#m

As a member of the Kimi team, I wrote the linked blog to share how our team tackles truly innovative work together–not just as individuals, but as a coordinated group. 💎I fully agree: “you can always trust the Kimi solidness.” For us, solidness means making ideas actually work
https://x.com/YyWangCS17122/status/2034273847164473820#m

For more details, check out our paper here:
https://x.com/Kimi_Moonshot/status/2033378599450079581

Thread by @Kimi_Moonshot on Thread Reader App – Thread Reader App https://threadreaderapp.com/thread/2033378587878072424.html

Xiaomi has released MiMo-V2-Pro, which scores 49 on the Artificial Analysis Intelligence Index, placing it between Kimi K2.5 and GLM-5 @Xiaomi’s MiMo-V2-Pro is a new reasoning model and a significant upgrade over their prior open weights release, MiMo-V2-Flash (309B total / 15B
https://x.com/ArtificialAnlys/status/2034239267052896516#m

BullshitBench update: The new GPT-5.4 mini and nano models score quite low. This screenshot shows OpenAI models only, on the full list would put GPT-5.4-mini around 40th place and Nano is around 70th place. Again thinking didn’t help much at all.
https://x.com/petergostev/status/2033995459522396287

GPT 5.4 is a big step for Codex – by Nathan Lambert https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex

gpt-5.4 has ramped faster than any other model we’ve launched in the API: within a week of launch, 5T tokens per day, handling more volume than our entire API one year ago, and reaching an annualized run rate of $1B in net-new revenue. it’s a good model, try it out!
https://x.com/gdb/status/2033605419726483963

GPT-5.4 nano is is also available starting today in the API.
https://x.com/OpenAI/status/2033953595637538849

GPT-5.4-mini looks really good for computer-use
https://x.com/scaling01/status/2033954794105127007

Ran a small eval today on an LM using GPT-5.2 as a judge. Model scores 10%, but paper reports it scoring 34%. I see that the paper uses GPT-5.1 as a judge; for the sake of consistency I change it. Switch to GPT-5.1 as a judge. Model now scores 43.5%… bro
https://x.com/a1zhang/status/2034059629072945251#m

This, but for real* Here’s METR-style graph of labor displacement from Roman aqueducts, doubling time of CDDII years. Lesson: 1) Displacing terrible work is good 2) All exponentials become s-curves in the end * I had GPT-5.4 Pro do the research, spot checks seemed accurate.
https://x.com/emollick/status/2033636278508425646

[2603.17378] Efficient Exploration at Scale https://arxiv.org/abs/2603.17378

@dexhorthy i’m so glad to see all this because i had a gut aversion to writing specs it felt like it was as much work as writing the code and its way more fun thinking through things by writing the code (with ai) you all articulated why that is better – was worried i’d be forced into
https://x.com/thdxr/status/2034095613822808165?s=20

🚀 A new @code weekly release is here! One of our favorite features is integrated browser debugging, which lets you debug your web app end-to-end without leaving the editor.
https://x.com/code/status/2034332099231072639#m

🚀 Introducing Qianfan-OCR: a 4B-parameter end-to-end model for document intelligence. One model. No pipeline. Table extraction, formula recognition, chart understanding, and key information extraction, all in a single pass. Paper: https://t.co/cmNhv5SLgV Models:
https://x.com/Baidu_Inc/status/2034265136182202765#m

10x Data Efficiency – NanoGPT Slowrun – Q https://qlabs.sh/10x

360 billion tokens, 3 million customers, 6 engineers – Vercel https://vercel.com/blog/360-billion-tokens-3-million-customers-6-engineers

Actually these ideas predate a lot earlier as well. But let’s celebrate all of the papers and the current results too 🤝 Memory 📈
https://x.com/_arohan_/status/2033587983455293638

Antoine and team had trained a nice ColBERT late interaction model last year… Now they decided to try it on BrowseComp+, the canonical “”deep research”” task. Guess what, it’s not only the strongest method by far but also basically solved the task (~90%). Who would have thunk!
https://x.com/lateinteraction/status/2034651175023157550

ASMl doesnt get enough credit for what they are doing. EUV lithography machines are so extraordinarily complex – with deep, narrow supply chains (like Zeiss’s small mirror team) that can’t scale fast enough- that production is likely capped around 100 machines per year by 2030,
https://x.com/kimmonismus/status/2034290731246907618#m

AttnRes is not just a typical “”novelty paper””. it stems from a much bigger project, co-designed by both model research and infra teams, with considerations that go way beyond just “”lower loss”” or “”better expressivity””. here is the “”ultra think pro xhigh”” part from inference
https://x.com/bigeagle_xd/status/2034104829703045258#m

Axiom: The form of AI that we ended up with is deeply weird in ways that we don’t fully get. Attempts to pretend AI is less weird & apply it like a standard IT product will inevitably result in less useful & far less reliable AI implementations than those that embrace weirdness.
https://x.com/emollick/status/2033704330444861736

Can LLMs Be Computers? | Percepta https://www.percepta.ai/blog/can-llms-be-computers

CoderPad State of Tech Hiring 2026 – CoderPad https://coderpad.io/survey-reports/coderpad-state-of-tech-hiring-2026/

Cold starts for large models are one of the hardest problems in AI inference infrastructure. Today we’re launching the Baseten Delivery Network (BDN) to solve one of the hardest parts of this problem. 2-3x faster cold starts for large models at scale via optimizations at the
https://x.com/baseten/status/2034681788724019700

Continued Pretraining is going to be more and more common to help people unlock the full potential of there RL environments.
https://x.com/code_star/status/2034672762263060562

damn this is so good and encapsulates everything I’ve been seeing/saying in the last few months – a spec that is sufficiently detailed to generate code with a reliable degree of quality is roughly the same length and detail as the code itself – so don’t review those things,
https://x.com/dexhorthy/status/2033980486813684181?s=20

Did you know? We’re funding independent research in AI evaluation and measurement–up to $50k per project. The Q1 deadline to apply for Arena’s Academic Partnerships Program is March 31.
https://x.com/arena/status/2034294095150215182#m

Everyone talks about how good multivector models like ColPali and ColBERT are. But it always comes up that they 𝘢𝘭𝘴𝘰 require lots more memory – unless you use 𝗠𝘂𝘃𝗲𝗿𝗮. A dataset with 1M documents using ColBERT can require 40GB of memory just for the embeddings. Compare
https://x.com/victorialslocum/status/2034253990582423716#m

i don’t think you can really wishcast better underlying architectural primitives than any-to-any parallel communication over factorized sequences”” Yet another reminder of why dense single-vector retrieval never stops being broken on any semi-challenging OOD setting!
https://x.com/lateinteraction/status/2034254747666960683#m

I trained a neural network to mimic a forest trail near my apartment (web demo in the post) https://x.com/madebyollin/status/1915838978718158890

I’ll be honest: we didn’t necessarily want to build hankweave – we didn’t have the time to build a runtime. (Who does?) The problem is that hankweave becomes an inescapable requirement once you add up some self-evident truths about AI today.
https://x.com/hrishioa/status/2034666470932922745

If I had to compress my PhD into one idea, it is this “”The data a model sees early in training leaves an imprint on its representations that is very hard to undo later”” This thread runs through – Rephrasing the Web – Safety Pretraining – TOFU This is the Finetuner’s Fallacy🧵
https://x.com/pratyushmaini/status/2034653569706811782

Implementing Attention Residuals. https://x.com/arjunkocher/status/2033846693918347641

Incorporating SFT data during pretraining is more effective for finetuning than the plain pretraining and finetuning scheme, even considering replay during finetuning. But the ratio of SFT data during pretraining should consider the token budget for pretraining. They built a
https://x.com/rosinality/status/2034178558440898786#m

Industry tends to default to fine-tuning for domain adaptation because it seems cheaper, but only if you don’t consider inference. In new work from @datologyai, we show that mixing domain-specific data early into training drives better performance and reduces inference costs.
https://x.com/arimorcos/status/2034295652193370602#m

Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it’s been held back by small state sizes, not by non-linearity itself. 📄 Paper: https://t.co/IPECFJ7f3p 💻 Code:
https://x.com/MayankMish98/status/2034681226217595333

It works on a frontier scale and is elegant, both in terms of the formula and the paper itself 🙂 (@yzhang_cs has become the go-to aesthetics person after these figures, no longer a pure researcher…)
https://x.com/nathancgy4/status/2033390157102244098

LLM Architecture Gallery | Sebastian Raschka, PhD https://sebastianraschka.com/llm-architecture-gallery/

Mamba-3 is out! 🐍 SSMs marked a major advance for the efficiency of modern LLMs. Mamba-3 takes the next step, shaping SSMs for a world where AI workloads are increasingly dominated by inference. Read about it on the Cartesia blog:
https://x.com/cartesia/status/2034338862559121475#m

MiMo-V2-Pro | Xiaomi https://mimo.xiaomi.com/mimo-v2-pro

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10-50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵
https://x.com/_christinabaek/status/2034285795071205737#m

Nonlinear RNNs seem to do sth genuinely different from attn and linear RNNs/SSMs. By themselves they already do quite well w the right parametrization, but just one nonlinear RNN layers substantially improve transformer-mamba/deltanet hybrid!
https://x.com/tri_dao/status/2034696258938708438

Super impressive results! Also a great reflection of how training pipelines have evolved for specialized LLMs. Continued pretraining over domain-specific / curated data to create a good seed for large-scale RL. Simple (at least conceptually) and very effective. “”These quality
https://x.com/cwolferesearch/status/2034713982515179672

thanks to @cartesia for supporting this project, providing compute, and testing the models! we believe that such research advances are highly impactful for natural, real-time intelligence and continue to invest in the frontier of efficient models Blog cross-posted:
https://x.com/_albertgu/status/2034347202613739947#m

The Finetuner’s Fallacy Finetuning seems like the cheapest path to model adaptation, but introducing domain data during pretraining makes it much more valuable Through specialized pretraining you can train a smaller model that outperforms a bigger model that was just finetuned
https://x.com/pratyushmaini/status/2034296042540466252#m

The gripper misses by 2mm. Latency spikes kill your control loop. The sim-to-real gap eats weeks of engineering time you don’t have. Your model works in simulation. It even looks great in the demo. Then you deploy it… and everything breaks. Not a model, I t’s an infrastructure
https://x.com/IlirAliu_/status/2032524422423196029

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We’ve introduced several SSM-centric ideas to significantly increase Mamba-2’s modeling capabilities
https://x.com/_albertgu/status/2033948415139451045

There’s an economics theorem called Alchian-Allen. And it has the very interesting implication that AI labs will be able to charge *higher* margins on their best models as compute gets scarcer. As compute gets more expensive, the cost of running any model goes up. So you might
https://x.com/dwarkesh_sp/status/2032572157243302154

This is important, as pointed out by @PV90169 NVLink has no duplex either and the GBps in spec is a complete fabrication – you only get half of that speed. Same as with NVLINK-C2C I confirmed it by benchmarking it and posted the results confirming that NVLink has no duplex 2x
https://x.com/StasBekman/status/2034315810693599349#m

This is THE sleeper reason teams stick with DSPy.
https://x.com/dbreunig/status/2034061742196859076#m

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: https://t.co/Ng79Xr7rG5. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy
https://x.com/behrouz_ali/status/2033581834953453853

vLLM Production Stack now has an end-to-end deployment guide on @OracleCloud OKE 🚀 Self-hosted LLM inference on OCI bare metal GPUs (A10, A100, H100) — from provisioning to first request. OCI deployment scripts are contributed and maintained in the official production-stack
https://x.com/vllm_project/status/2033560408980914550

We made TurboAPI hit 150k req/s. In under a day. It is now 22x faster than FastAPI Thanks to the amazing contributions from the people in the comment section, which allowed me to view what made the hyper optimized frameworks work the way that they do! Here’s what changed..
https://x.com/rachpradhan/status/2034576637359161365

We put AssistantBench out 1.5 yrs ago, but it’s unsolved and teams are making submissions that achieve new SOTAs! This example question was written by me, I was actually living near there at the time, and frustrated that personal assistants couldn’t answer basic questions.
https://x.com/OfirPress/status/2034347578653868374#m

We’ve been lucky enough to test Mamba-3 ahead of the curve. 🧪 Here is how it integrates into Hybrid Models (Spoiler: it unlocks Muon for SSMs for the first time). 🧵
https://x.com/JG_Barthelemy/status/2034039081085108390#m

We’ve spent years building LlamaParse into the most accurate document parser for production AI. Along the way, we learned a lot about what fast, lightweight parsing actually looks like under the hood. Today, we’re open-sourcing a light-weight core of that tech as LiteParse 🦙
https://x.com/llama_index/status/2034661997644808638

Yann LeCun is pumping out papers recently “Temporal Straightening for Latent Planning” This paper shows that by straightening latent trajectories in a world model, Euclidean distance starts to reflect true reachable progress, so it’s closer to geodesic/minimum-step distance.
https://x.com/askalphaxiv/status/2033345556949397718

Moral of the story is not to use LLM-as-judge without determining human correlation or tuning to maximize that
https://x.com/torchcompiled/status/2034068339023102060#m

We’re re-releasing Open SWE with a new emphasis on deep integrations with the applications you already use It integrates with: – Slack – Linear – Github so you can use it directly from the applications you already spend your time in, without needing to learn a new platform
https://x.com/BraceSproul/status/2033962118970818650

AI Security Best Practices Guide | Datadog https://www.datadoghq.com/resources/ai-security-best-practices/