“A new paper shows that AI agents are improving rapidly at long tasks – but they aren’t reliable yet. That being said this feels significant: “more than 80% of successful runs cost less than 10% of what it would cost for a human [L4 software engineer] to perform the same task.” https://x.com/emollick/status/1902443733158609005
“When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months. https://x.com/METR_Evals/status/1902384481111322929
“Relevant questions when discussing AI hallucination: 1) What is your tolerance for error? And how good are humans at getting that error rate? 2) Does the AI get more or less wrong than a human?” / X https://x.com/emollick/status/1902401547885015308
“When working with LLMs I am used to starting “New Conversation” for each request. But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between” / X https://x.com/karpathy/status/1902737525900525657
“Why? As model capabilities change (hello reasoning and LM assistants), benchmarks need to follow! The leaderboard is slowly becoming obsolete; we feel it could encourage people to hill climb in irrelevant directions. So this is the end! (hold your breath & count to 10)” / X https://x.com/clefourrier/status/1900280341887091071
“I regret to announce that the meme Turing Test has been passed. LLMs produce funnier memes than the average human, as judged by humans. Humans working with AI get no boost (a finding that is coming up often in AI-creativity work) The best human memers still beat AI, however. https://x.com/emollick/status/1901431681279475808
“So far, as LLM model size gets larger, it seems to have a direct effect on reducing known problems with LLMs. Bigger LLMs hallucinate less, show less bias, and are less sensitive to prompting style, among other things. Not that these problems go away, but they do decrease.” / X https://x.com/emollick/status/1901776990962630666
“There have been countless efforts to make software development “more visual”, but anything that isn’t a simple collection of human (and LLM!) readable text files continues to step on land mines.” / X https://x.com/ID_AA_Carmack/status/1902088032519405919
Measuring AI Ability to Complete Long Tasks – METR https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
“🔴Excited to share my TED AI talk! How we advance AI into an era of superintelligence, with true *practical* impact in our world. ❓I ask: For all the groundbreaking milestones we’ve reached, all the intelligence benchmarks we’ve shattered, for the $53B invested in generative https://x.com/stephzhan/status/1902377745335947580
“”This is the year that AI gets better than humans at programming forever. And there’s no going back.” OpenAI CPO Kevin Weil highlights the rapid improvement of AI models in competitive coding, predicting one will reach the number 1 spot in 2025. He says AI surpassing humans in https://x.com/vitrupo/status/1901258363364897016
“I just created 4 weeks on content in 2 minutes with Manus! This is the closest I’ve felt to AGI. Manus creates separate docus with each 𝕏 post/thread saved in as drafts. Final step: copy over to Typefully or any post scheduler and automate your social media growth. https://x.com/Lyle_AI/status/1898538952186851663
“there is a lot of wisdom in this thread. joe coaches the research and compute teams at openai; i super enjoy working with him. one superpower is that he deeply understands emotional clarity and how to get there; this will be one of the most critical skills in a post-AGI world.” / X https://x.com/sama/status/1902751101134438471
“Kevin Weil, OpenAI’s CPO, says that the next obvious step for AGI beyond the digital world is robotics and real-world impact. https://x.com/TheHumanoidHub/status/1901544115000742364
“Watch for a 14min demo of me using Manus for the 1st time. It’s *shockingly* good. Now imagine this in 2-3 years when: – it has >180 IQ – never stops working – is 10x faster – and runs in swarms by the 1000s AGI is coming – expect rapid progress. https://x.com/mckaywrigley/status/1898756745545252866
““I believe now is the right time to start preparing for AGI” The same warnings are now appearing with increasing frequency from smart outside observers of the AI industry, like @kevinroose (below) & Ezra Klein. I think ignoring the possibility they are right is a real mistake. https://x.com/emollick/status/1900575976284660146




