Ethan B. Holland

Over 54,900 manually organized AI links and counting

AGI: AI News Week Ending 02/28/2025

February 28, 2025

27% of Job Listings For CFOs Now Mention AI – Slashdot https://slashdot.org/story/25/02/18/1718255/27-of-job-listings-for-cfos-now-mention-ai

“Vibe coding is how creative builders actually work. It’s the fusion of technical skill and artistic intuition — writing code and wrangling pixels, while surrendering to serendipity. It’s wu wei — effortless action in the digital realm. A movement was inevitable, and it took https://x.com/bilawalsidhu/status/1893462574504780273

“What’s truly exciting is that these robots can now generally pick up any household item For instance, we asked it to “Pick up the desert item” Helix identifies the toy cactus, chooses the nearest hand, and executes precise motor commands to grasp it securely! https://x.com/adcock_brett/status/1892579136956186947

“”Great now make a new snake game that is aware of the snake game you just made” That was it, the only prompt… https://x.com/emollick/status/1894480971648377198

“So far, at the start of every new model generational cycle, there are much bigger differences between models than later in the cycle. I expect some of the new models will be better at coding, others better at writing, etc. Those differences likely shrink as updates happen.” / X https://x.com/emollick/status/1894747074152808462

“Why did AI benchmarking decide on using consensus@64 or pass@64? Is there an intellectual or statistical basis for choosing 64 trials or even using the pass/consensus approaches?” / X https://x.com/emollick/status/1893047065724264678

“There is something about the LLM scaling laws that people don’t realize. When someone says we’ve reached the end of the scaling laws, they usually mean that we’ve trained AI models on all the data on the internet. They think there’s nothing left to train on, so logically, the https://x.com/JonathanRoss321/status/1892596586347160059

“we are on track for my 90% swe-bench verified prediction https://x.com/scaling01/status/1894096594225578129

“@karpathy Do you really think AI models won’t have agency soon too?” / X https://x.com/polynoamial/status/1894468586598797661

“How do agents plan and reason? Here are recent breakthroughs in reasoning that unlock advanced AI capabilities: 1. Chain-of-Thought (CoT) prompting 2. Self-reflection and self-consistency 3. Few-shot and in-context learning 4. Neuro-symbolic approaches 1. CoT prompting: Guides https://x.com/TheTuringPost/status/1893965514151719371

“Helix coordinates a 35-DoF action space at 200Hz Controlling everything from individual finger movements to end-effector trajectories, head gaze, and torso posture! https://x.com/adcock_brett/status/1892579000817521092

“Helix is a novel architecture, “System 1, System 2” > System 2 is an internet-pretrained 7B parameter VLM (big brain) > System 1 is an 80M parameter visuomotor policy (fast control) Each system runs on onboard embedded GPUs, making it immediately ready for commercial https://x.com/adcock_brett/status/1892579188424712682

“Our first customer use case took 12 months – our second, just 30 days Helix is enabling robots to scale with a single neural network On Sunday, we successfully tested robots on-site with the customer! https://x.com/adcock_brett/status/1894781636153405870

“We’re ramping up to ship humanoid robots at unprecedented levels in 2025 If you’re interested in AI and Robotics give us a follow: @Figure_robot Help us spread the word, Like/Repost the below: https://x.com/adcock_brett/status/1894782815981711810

“I’m very excited about this work! If SAEs work we hypothesised they should help probe in difficult regimes, but in 5 regimes and 100+ datasets linear probes won This was a negative update on SAEs for me and highlights the value of grounding interpretability with downstream tasks” / X https://x.com/NeelNanda5/status/1894749262757634405

“people underestimate the mental cost of outsourcing code to Copilot/Cursor it’s a mortgage: quick progress now at the expense of not understanding your own codebase it may be that beyond simple line autocomplete, it’s more efficient in the long run to do everything yourself” / X https://x.com/jxmnop/status/1894830128082940182

“Announcing a new AGI Benchmark: SholtoBench SholtoBench tracks which AGI lab the formidable Sholto Douglas (@_sholtodouglas) works at. Our comprehensive infrastructure uses AI agents to ensure we keep up to date with the latest public information. Huge thanks to all who helped! https://x.com/nearcyan/status/1892469757653442989

There’s Something Very Weird About This $30 Billion AI Startup by a Man Who Said Neural Networks May Already Be Conscious
https://futurism.com/ilya-sutskever-safe-superintelligence-product

“gpt-4.5 has incredible world knowledge. on simpleqa (a not so simple factuality benchmark), it’s more accurate than any other model: >gpt-4.5 — 62.5% >grok-3 — 43.6% >gpt-4o — 38% >o3-mini — 15% https://x.com/aidan_mclau/status/1895204587608645691

“The Relationship Between Reasoning and Performance in Large Language Models o3 (mini) Thinks Harder, Not Longer https://x.com/_akhaliq/status/1893853535122374762

“BIG-Bench Extra Hard Google DeepMind introduces BIG-Bench Extra Hard (BBEH), a new benchmark designed to push the boundaries of LLM reasoning evaluation. “BBEH replaces each task in BBH with a novel task that probes a similar reasoning capability but exhibits significantly https://x.com/iScienceLuvr/status/1895044794147316073

“🚀 Day 0: Warming up for #OpenSourceWeek! We’re a tiny team @deepseek_ai exploring AGI. Starting next week, we’ll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,” / X https://x.com/deepseek_ai/status/1892786555494019098

HCI for AGI – Google DeepMind https://deepmind.google/research/publications/106025/

“AlphaMaze: Teaching a 1.5B LLM to think visually and solve ARC-AGI like puzzles! 🤯 Powered by DeepSeek R1 1.5B + GRPO All with Apache licensed checkpoints and dataset 🤗 https://x.com/reach_vb/status/1892999150255440012

“welcome, gpt-4.5 i’ve spent a lot of time playing with this model recently, and it’s left me feeling the agi some thoughts https://x.com/aidan_mclau/status/1895204299040530794

“Something this highlights is how intensely the models improvements are narrowing. Oais in math, anthropics in swebench. Does this bode well for agi? It seemed like improving any reasoning was generalizing across the board until i saw this Time to vibe check because these” / X https://x.com/Teknium1/status/1894100993815760945

“If we want interpretability to help make AGI safe, it must be applied in practice. So, at GDM, we’re starting a team aiming to use model internals in production to make Gemini safer! If that vision excites you, please apply! Engineers and researchers are both welcome. Due Friday” / X https://x.com/NeelNanda5/status/1894192519719907467

“welcome, gpt-4.5 i’ve spent a lot of time playing with this model recently, and it’s left me feeling the agi some thoughts https://x.com/aidan_mclau/status/1895204299040530794