Image created with gemini-3.1-flash-image-preview with claude-sonnet-4-5. Image prompt: Photorealistic wide shot of a natural amphitheater carved into a frozen bay ice shelf at winter dusk, concentric semicircular ice terraces descending to dark water, mathematical symbols and diagrams naturally formed in center ice, sunset gradient sky from deep blue to orange illuminating the scene from behind, golden light catching ice edges while shadows fall across the steps, 4K nature documentary cinematography, physically grounded ice textures, landscape orientation with bold sans-serif EDUCATION text overlay.

I pointed Claude Cowork at a set of 107 documents (PPTs, Word docs, Excel) that were initially hand-created for my class at Wharton & expanded on by AI. They make up a very complex business case with lots of issues & opportunities AI was able to one-shot the case from documents”” https://x.com/emollick/status/2021638881158857204

So far “telling a satisfying and well-written medium-length story” has proved far harder for LLMs than mathematical proofs, music generation, research reports, code, and many other forms of work. The technical reasons are pretty clear, but they are supposed to be language models”” https://x.com/emollick/status/2020993610540605560

The poetry tastes of GenAI: “”I want you to suggest two poems that you think apply very well the current state of GenAI models like you. Don’t just pick popular poems and back justify. Think hard about options first.”” ChatGPT, Gemini & Claude all suggest Borges’s “”The Golem”””” https://x.com/emollick/status/2021677609872986450

AI needs better evaluations. Today we’re announcing Arena’s Academic Partnerships Program to fund independent academic research in AI evaluation and measurement. ▫️Up to $50K/project. Q1 Deadline: March 31, 2026. See more in thread for details and how to apply 👇”” https://x.com/arena/status/2021268433619374336

[2602.10177] Towards Autonomous Mathematics Research https://arxiv.org/abs/2602.10177

1/ AxiomProver has solved Fel’s open conjecture on syzygies of numerical semigroups, autonomously generating a formal proof in Lean with zero human guidance. This is the first time an AI system has settled an unsolved research problem in theory-building math and self verifies.”” https://x.com/axiommathai/status/2019449659807219884?s=20

Can just a 4B model solve Olympiad-level proof problems at the level of giant proprietary LLMs? We built QED-Nano 🚀, a 4B model that we carefully post-trained for Olympiad-level proof problems, matching 30x larger models like gpt-oss-120B. We specifically used RL recipes that”” https://x.com/setlur_amrith/status/2022022298874917015

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading