Agents and Copilots: AI News Week Ending 02/14/2025

Agents and Copilots: AI News Week Ending 02/14/2025

February 13, 2025

“🌻AutoGen Python v0.4.5 released with a few exciting features: – Streaming tokens from Agents and Teams – Native support for R1-style reasoning output – Partial functions as tools Adds new samples to demonstrate these new features: – AgentChat + @chainlit_io with streaming –
https://x.com/pyautogen/status/1886534539327181308

“🚀@LevangieLabs set a new record for UI navigation accuracy! 🧵 Let’s break down what makes CUA-NAV the most precise UI Navigation model. ⬇ Comparisons below! Shoutout to @gregosuri and the team at @akashnet_ for making this possible. CUA-NAV was built on @akashnet_ !
https://x.com/blevlabs/status/1886177565498323278

“🚨 AgentStack v0.3.3 ~ New Framework support! 🦙 @llama_index added as an official framework 🔨 @PaymanAI tool added 🥰 Dev experience improvements 🪳 Bug fixes
https://x.com/braelyn_ai/status/1887958100612952080

“o3 achieved 99.8th percentile on Codeforces
https://x.com/arankomatsuzaki/status/1889522980096712945

“the hackbot singularity is coming” / X
https://x.com/rez0__/status/1888801773558665464

“The next phase of AI in commerce begins today🐳 With Moby agents that can do 👇 1️⃣Media buying 2️⃣inventory management 3️⃣retention marketing 4️⃣conversion rate optimization And so much more… We have 100 spots signup today
https://x.com/AY_Orbach/status/1886863552520032583

“Try text2web at
https://x.com/lmarena_ai/status/1889496847708045496

“writing any code manually in 2025 is like writing assembly to build a web app in 2024″ / X
https://x.com/vikhyatk/status/1889597476895662336

[R] o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors” : r/MachineLearning https://www.reddit.com/r/MachineLearning/comments/1io4c7r/r_o3_achieves_a_gold_medal_at_the_2024_ioi_and/

🎯 “We don’t know yet what will be the most impactful tool in AI. So iterate. A lot.”- @Thom_Wolf to a packed crowd at @_STATIONF https://x.com/fdaudens/status/1889362819709014070

🚀@LevangieLabs set a new record for UI navigation accuracy! 🧵 Let’s break down what makes CUA-NAV the most precise UI Navigation model. ⬇ Comparisons below! Shoutout to @gregosuri and the team at @akashnet_ for making this possible. CUA-NAV was built on @akashnet_ ! https://x.com/blevlabs/status/1886177565498323278

A few implications of tricks like this: 1) We are still VERY early in the development of Reasoners 2) There is high value in understanding how humans solve problems & applying that to AI 3) Higher possibility of further exponential growth in AI capabilities as techniques compound / X https://x.com/emollick/status/1887884562958569969

Apps created in FastHTML, @htmx_org, and MonsterUI can take a few hours to write, are really easy to maintain and add to, and look and feel great to use. / X https://x.com/jeremyphoward/status/1889430719988113911

Baidu to make AI chatbot Ernie Bot free of charge from April 1 | Reuters
https://www.reuters.com/technology/artificial-intelligence/baidu-says-ai-model-ernie-free-april-2025-02-13/

Excited to share details of AlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! AG2 now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems https://x.com/lmthang/status/1887928665100665111

Full benchmarks here: https://x.com/ArtificialAnlys/status/1889150373635715317
🌻AutoGen Python v0.4.5 released with a few exciting features: – Streaming tokens from Agents and Teams – Native support for R1-style reasoning output – Partial functions as tools Adds new samples to demonstrate these new features: – AgentChat + @chainlit_io with streaming – https://x.com/pyautogen/status/1886534539327181308

Harvey Raises $300M Series D Led by Sequoia
https://www.harvey.ai/blog/harvey-raises-series-d

It is kind of shocking how little AI assistance has influenced the discourse on this site, given there is a button next to every post that lets you ask a solid AI to fact check the details. It does a pretty good job. This has made no difference at all in the spread of nonsense. / X https://x.com/emollick/status/1888663558465945676

LLM framework for task decomposition and agents https://x.com/tom_doerr/status/1887125002459160848

o3 achieved 99.8th percentile on Codeforces https://x.com/arankomatsuzaki/status/1889522980096712945

the hackbot singularity is coming / X
https://x.com/rez0__/status/1888801773558665464

The next phase of AI in commerce begins today🐳 With Moby agents that can do 👇 1️⃣Media buying 2️⃣inventory management 3️⃣retention marketing 4️⃣conversion rate optimization And so much more… We have 100 spots signup today https://x.com/AY_Orbach/status/1886863552520032583

Three Observations – Sam Altman
The intelligence of an AI model roughly equals the log of the resources used to train and run it.
The socioeconomic value of linearly increasing intelligence is super-exponential in nature.
The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use.
https://blog.samaltman.com/three-observations

Try text2web at https://x.com/lmarena_ai/status/1889496847708045496

writing any code manually in 2025 is like writing assembly to build a web app in 2024 / X https://x.com/vikhyatk/status/1889597476895662336

A few implications of tricks like this: 1) We are still VERY early in the development of Reasoners 2) There is high value in understanding how humans solve problems & applying that to AI 3) Higher possibility of further exponential growth in AI capabilities as techniques compound” / X https://x.com/emollick/status/1887884562958569969

“New Work on InSTA: A pipeline for Internet-scale training of web agents across 150k diverse websites without human annotations. Paper + Code: https://x.com/rsalakhu/status/1889492471630946662

OpenAI just dropped Competitive Programming with Large Reasoning Models Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, https://x.com/_akhaliq/status/1889523662732042610

Competitive Programming with Large Reasoning Models New paper from OpenAI highlighting results of their reasoning models on IOI and CodeForces. We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed https://x.com/iScienceLuvr/status/1889517116816244995

i think every zoomer is building some version of rizz gpt because their brains are permanently damaged from lockdown and they don’t know how to have a normal conversation with a woman. which like, honestly same https://x.com/andersonbcdefg/status/1890079063169224831

Google presents Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 https://x.com/_akhaliq/status/1887718062863855625

Deepseek is killing it! I’ve long been Team Claude for everything coding but Deepseek blew past Claude in our OSS PR review 81% critical bug to noise ratio with 3.7x more bugs caught! https://x.com/Aiswarya_Sankar/status/1887356821738037742

Browser Use UI can do DeepResearch👀 Repo is 100% open source ↓ Thanks @Gradio for powering the UI🔥 https://x.com/gregpr07/status/1887622796337197340

AI can now compete with the most elite human programmers in competitive coding. o3 won gold in the International Olympiad in Informatics, beating a specialized version of o1 designed for the contest. General-purpose reasoning done by big LLMs now beats hand-crafted strategies. https://x.com/emollick/status/1889775905435771145

if you are publishing a software library in 2025, the only way to get adoption is to also publish context.txt that people can paste into the LLM to get codegen to work correctly / X https://x.com/vikhyatk/status/1889540437557518843

Introducing OpenR1-Math-220k! https://x.com/_lewtun/status/1889002019316506684

We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers. We evaluated them on the AIME 2025 I competition from *yesterday* and the results are good! https://x.com/mbalunovic/status/1887962694659060204?s=46

“Data Formulator Microsoft Research presents Data Formulator, an application that leverages LLMs to transform data and create rich visualizations. > pip install data_formulator https://x.com/omarsar0/status/1889325784512581785