About This Week’s Covers

This week’s cover pays homage to the namesake of xAI’s flagship language model Grok. The big story this week is that Grok-3 was released, and the name Grok comes from the Robert Heinlein novel “Stranger in a Strange Land”. The word means to ‘to understand intuitively or by empathy’. This week’s cover is a robot remake of the original novel’s cover. I tried to have Grok make the cover, but it failed miserably. I ended up using flux pro. Claude sonnet 3.5 helped me identify the key descriptors to re-create the atmosphere of the image (below).

The covers this week were built, using the entire prompt from flux with the request to modify them into each particular category using Claude + Ideogram.

This Week By The Numbers

Total Organized Headlines: 634

This Week’s Executive Summaries

The top story this week was that xAI launched Grok-3, which claimed the top spot in the AI model rankings. This puts the lag between X and OpenAI at about nine months. This is similar to the gap between DeepSeek and OpenAI. I’m not an expert, but my impression is that DeepSeek got to where they were through efficiencies and reinforcement learning, whereas X was able to get their results by throwing massive amounts of computing power and scale at the problem (see the Colossus supercomputer project). Part of me wonders what the next models will look like when you combine these two techniques: scale and reinforcement learning. The rumor is that Anthropic is going to launch a new version of Claude in the next week or two. OpenAI and Meta also have new updates coming soon. The horse race continues!

The second big story this week is that Microsoft has developed a completely new state of matter that they plan to use for quantum computing. My takeaway here is that while other companies are chipping away—no pun intended—at quantum computing using traditional techniques, Microsoft took a step back, like Tiger Woods relearning his swing, came out with a completely new architecture, and literally created a new type of matter!

Google came out with a new AI research tool called Co-Scientist. A university researcher took 10 years of his own work and asked Co-Scientist what it thought the answer would be—without telling the system any details and using only a simple prompt. In 48 hours, Google’s system was able to recreate 10 years of research.

The New York Times has formally adopted AI in the newsroom, approving a variety of tools, including its own proprietary tool called Echo. This does not conflict with the NYT’s lawsuit against OpenAI, but it does show that AI is becoming widely embraced, even by the most traditional publishers.

Perplexity has reconfigured the Chinese open model DeepSeek and turned it into a new tool called Deep Research, which is able to research and reason at almost the same level as OpenAI’s $200/month research model.

Perplexity also retrained DeepSeek to remove its biases and censorship and has re-released DeepSeek as R1 1976.

Robotics company Figure has released its own proprietary multi-modal model that allows robots to pick up objects they have never seen before and understand them simply by receiving natural language instructions. This comes only a few weeks after Figure announced that they would no longer partner with OpenAI. The new model for the robots runs at a spectacular 200 times per second and has been tested on thousands of household items that the robots had never seen before. Impressively, the system runs on regular computer chips.

Another big name from OpenAI announced a new company after several months of silence, presumably due to non-compete agreements. The former CTO of AI, Mira Murati, has founded Thinking Machines Lab.

Microsoft came out with a new system that can let any large language model understand computer interfaces by detecting everything that can be clicked on and mapping their functions back to language commands.

Google announced that Gemini will now be able to remember all chat histories and converse between chats to give continuity and personality to users who want to resume where they left off or connect multiple chats.

OpenAI created a benchmark to test models against actual programming tasks using the website Upwork. OpenAI found about 1,400 tasks that language models struggle to complete and will use this benchmark to track progress over time. The goal is to see when language models are able to achieve $1 million in productivity value across multiple projects.

ChatGPT-4 has been upgraded without being renamed and now holds the number one position in six major categories. Of course, this all comes at the same time as Grok and other releases, so it’s almost like following a bouncing ball with the leaderboard.

xAI’s Grok-3 Claims Top Spot in AI Model Rankings
xAI released Grok-3, which ranks first place on the Chatbot Arena leaderboard with a record-breaking 1400 score. Grok-3 outperforms leading models like GPT-4, Gemini 2 Pro, and Claude 3.5 Sonnet on key benchmarks, particularly in math, science, and coding tasks. The model was trained using 200,000 NVIDIA H100 GPUs – double the hardware used for Meta’s Llama 4. The system includes three specialized modes: Think, Big Brain, and DeepSearch, with the latter offering web search capabilities similar to recent offerings from Google and OpenAI. While Grok-3 hit impressive benchmarks, it will take a few weeks to know if Grok-3 prevails in real-world tasks. Audio input and output features are planned for the coming weeks. One take away that is consistent with the recent deep seek disruption is that the gap between closed models and open models is about 6-9 months, a relentless pace.

“Grok 3 drops tomorrow night—xAI’s billion-dollar bet on scaling. Reminder: xAI built Colossus, the world’s most powerful AI training cluster (100,000+ NVIDIA H100s in just 122 days) to train Grok 3. This comes after DeepSeek-R1 tanked the stock market by delivering a strong https://x.com/rowancheung/status/1891151253951987737

“AI NEWS: Elon Musk’s xAI just unveiled Grok-3 and ranked #1 on the Chatbot Arena. Plus, more news from Mistral’s new regional AI Saba, Ilya’s SSI, Nous Research, and a new open-source Chinese video model. Here’s what you need to know:” / X https://x.com/rowancheung/status/1891773915560583258

“Here are the benchmark numbers: Grok 3 significantly outperforms other models in its category such as Gemini 2 Pro and GPT-4o. Even Grok-3 mini shows to be competitive. https://x.com/omarsar0/status/1891706611023938046

Grok3 Launch Video / X https://x.com/i/broadcasts/1gqGvjeBljOGB

“Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.” / X https://x.com/elonmusk/status/1890958798841389499

“Grok 3 reasoning beta achieved 96 on AIME and 85 on GPQA, which is on par with the full o3. https://x.com/arankomatsuzaki/status/1891708250199839167

“Grok 3 is a new best model in the world from the @xai team! Grok 3 ranks #1 on Chatbot Arena w/a big gap, and scores impressively on pretraining and reasoning evals. congrats to @elonmusk @ibab @jimmybajimmyba @Yuhu_ai_ looking forward to more partnership on grok4 & beyond 🚀 https://x.com/alexandr_wang/status/1891714169629524126

“BREAKING: xAI announces Grok 3 Here is everything you need to know: https://x.com/omarsar0/status/1891705029083512934

“Grok 3 involved 10x more training than Grok 2! Grok finished pretraining in early January! The model is still training. https://x.com/omarsar0/status/1891705957220016403

“Elon mentioned that Grok 3 is an order of magnitude more capable than Grok 2. https://x.com/omarsar0/status/1891705031243469270

“BREAKING: @xAI early version of Grok-3 (codename “chocolate”) is now #1 in Arena! 🏆 Grok-3 is: – First-ever model to break 1400 score! – #1 across all categories, a milestone that keeps getting harder to achieve Huge congratulations to @xAI on this milestone! View thread 🧵 https://x.com/lmarena_ai/status/1891706264800936307

“Grok-3 without reasoning actually looks pretty good on these 3 cherry picked benchmarks. It’s also a good sign that they got 1400 Elo n lmsys from the get go. However, I feel like this launch was rather underwhelming. Too few benchmarks, no report and no useful demos. If it’s https://x.com/scaling01/status/1891786871304323280

“This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt). Try Grok 3 now: https://x.com/xai/status/1892400129719611567

“Grok 3 Reasoning Beta performance on AIME 2025. Grok 3 shows generalization capabilities. It not only does coding and math problem-solving, but it can also do other creative and useful real-world tasks. https://x.com/omarsar0/status/1891711110476111884

Grok 3, xAI’s New Model Family, Improves on its Predecessors, Adds Reasoning https://www.deeplearning.ai/the-batch/grok-3-xais-new-model-family-improves-on-its-predecessors-adds-reasoning/

“Reasoning models like Grok-3 reasoning beta and DeepSeek-R1 are trained using reinforcement learning with verifiable rewards, but what exactly does this mean? Verifiable tasks. One detail that we should immediately notice about reasoning models is that they are primarily used https://x.com/cwolferesearch/status/1891893034956030242

“Grok 3 also has reasoning capabilities too! The Grok team has been testing these capabilities which they have unlocked using RL. The model is good, especially in coding. https://x.com/omarsar0/status/1891707915351859547

“If the light blue part is best of N scores, this means that Grok 3 reasoning is inherently an ~o1 level model. This means the capabilities gap between OpenAI and xAI is ~9 months. Also what is the difference between “think” and “big brain” https://x.com/nrehiew_/status/1891710589115715847

Microsoft Creates New State of Matter for Quantum Computing
Microsoft has developed Majorana 1, a quantum processing unit that uses a new state of matter called topoconductors. The breakthrough enables qubits that are 100 times smaller than current versions and could accelerate the timeline for practical quantum computers from decades to years. The microscopic qubits, measuring 1/100th of a millimeter, create a path to processors containing a million quantum bits. for now, there is not much practical use but scientifically and for the future this is a big deal. Microsoft decided to go the slower route and create an entirely new state of matter, as opposed to banging their head on the bottlenecks of existing structures.

Satya Nadella on X: “A couple reflections on the quantum computing breakthrough we just announced… Most of us grew up learning there are three main types of matter that matter: solid, liquid, and gas. Today, that changed. After a nearly 20 year pursuit, we’ve created an entirely new state of https://t.co/Vp4sxMHNjc” / X https://x.com/satyanadella/status/1892242895094313420

Google AI Tool Solves Decade-Long Superbug Mystery in 48 Hours
Google’s tool called “co-scientist” identified how superbugs spread between species, matching conclusions that took Imperial College London researchers 10 years to discover. When Professor José Penadés tested the AI with a “short prompt” to match his unpublished research about bacterial resistance, it correctly did its own research and described how superbugs form virus-like tails to move between hosts. The AI also proposed four additional viable hypotheses, including one novel approach the research team is now investigating.

AI cracks superbug problem in two days that took scientists years https://www.bbc.com/news/articles/clyz6e9edy3o

“NEW: Google introduces AI co-scientist. It’s a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs. 2025 is truly the year of multi-agents! Let’s break it down: https://x.com/omarsar0/status/1892223515660579219

“Google’s new “co-scientist” solved a complex microbiology problem in 48 hours, a task that took researchers at Imperial College London a decade to complete. Professor José R. Penadés tested the AI with a hypothesis about superbug resistance, and it correctly identified the https://x.com/rohanpaul_ai/status/1892746665225826321

NY Times Embraces AI Tools to Support Newsroom Operations
The New York Times is implementing AI tools to assist journalists with tasks like editing, headline creation, and interview preparation, while maintaining editorial oversight. The company’s new internal tool, Echo, helps staff summarize articles and create social media content. While embracing AI for support functions, humans will continue handling reporting and editorial decisions. The Times is continuing its lawsuit against OpenAI and Microsoft over unauthorized use of its content for AI training.

The New York Times adopts AI tools in the newsroom | The Verge https://www.theverge.com/news/613989/new-york-times-internal-ai-tools-echo

Perplexity Uses DeepSeek to Launch AI Research Agent That Rivals OpenAI
Perplexity has unveiled Deep Research, a tool that creates detailed research reports by combining web search, analysis, and coding capabilities. The service matches OpenAI’s performance on industry benchmarks while operating significantly faster and at lower cost, leveraging DeepSeek’s open-source model. Users can generate reports on topics ranging from business incorporation guidance to investment analysis, with free users receiving 5 queries daily and Pro subscribers getting 500. Most reports are completed within 3 minutes.

“Deep Research as a small business incorporation legal consultant. Usually charged hundreds or even thousands of dollars an hour to offer this. Now free, only on Perplexity. https://x.com/AravSrinivas/status/1891563239240069245

“Perplexity Deep Research can write an investment memo like Bill Ackman. Example: writing a memo for taking a big position in $UBER. https://x.com/AravSrinivas/status/1891233048605184371

“Deep Research on Perplexity scores 21.1% on Humanity’s Last Exam, outperforming Gemini Thinking, o3-mini, o1, DeepSeek-R1, and other top models. We also have optimized Deep Research for speed. https://x.com/perplexity_ai/status/1890452359773405675

“Perplexity just announced Deep Research (PDR)! I’m now testing and comparing it with OpenAI’s Deep Research (ODR). I still think the o3 variant powering ODR is a massive advantage. 20.5% (PDR) vs. 26.6% (ODR) on Humanity’s Last Exam. https://x.com/omarsar0/status/1890525249977872640

“Perplexity Deep Research is quite close to OpenAI o3 on the Humanity Last Exam Benchmark despite being an order of magnitude faster and cheaper. This is possible because DeepSeek is open source and cheap and fast. https://x.com/AravSrinivas/status/1890486069361025040

Perplexity Removes Content Restrictions from DeepSeek AI Model, Names It R1 1776
Perplexity’s R1 1776 adapted Chinese DeepSeek’s R1 model to provide unrestricted responses while maintaining its problem-solving abilities. Testing across 1,000+ examples confirmed the model responds to sensitive topics while preserving its math and reasoning capabilities. If it lives up to the hype, the MIT-licensed model will bring the power of a frontier-model to a staggering number of third-party and personal use cases.

“Perplexity just released POST TRAINED DeepSeek R1 for factual and unbiased information – MIT Licensed 🔥 https://x.com/reach_vb/status/1891922768892989559

“Perplexity just dropped R1 1776 a version of the DeepSeek R1 model that has been post-trained to provide uncensored, unbiased, and factual information https://x.com/_akhaliq/status/1891961543455031429

“🎯 @perplexity_ai drops their FIRST open-weight model on @huggingface: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses. https://x.com/fdaudens/status/1891949269470351833

Figure’s AI System Lets Humanoid Robots Handle New Objects Through Simple Commands
Figure has launched Helix, an AI system that enables humanoid robots to pick up unfamiliar objects and work together using natural language instructions. The system controls the robots’ upper body movements – including fingers, wrists, and torso – at 200 times per second. In testing, robots using Helix successfully handled thousands of household items they had never encountered before, from toys to glassware, by responding to basic commands like “pick up the coffee mug.” The system runs on standard computer chips and needs no additional training to perform new tasks, suggesting potential for real-world applications. This is notably just a few weeks after Figure announced that they were no longer going to partner with OpenAI on language integration, but rather build their own systems.

Helix: A Vision-Language-Action Model for Generalist Humanoid Control https://www.figure.ai/news/helix

“In our lifetime you will see more humanoid robots than humans when you’re out and about” / X https://x.com/adcock_brett/status/1889946006558744898

“. @Figure_robot just unveiled Helix, a Vision-Language-Action (VLA) model powered humanoid robots to reason and interact naturally in home environments. 📌 Unlike prior systems, Helix allows robots to pick up any household item without training, collaborate in real-time, and https://x.com/rohanpaul_ai/status/1892662504054259779

Former OpenAI CTO Launches Thinking Machines Lab, Emphasizing Open Science and AI Customization
Mira Murati, previously CTO at OpenAI, has founded Thinking Machines Lab, bringing together talent from prominent projects like ChatGPT, Character.ai, and PyTorch. The company’s broad mission is to keep AI systems open, understandable, and customizable while advancing frontier research. The team plans to build multimodal AI systems that can excel in diverse fields beyond current AI’s strengths in programming and mathematics. It’s pretty vague and mirrors a lot of what we’re hearing from Safe Superintelligence, another start-up by OpenAI co-founder Ilya Sutskever that’s in the news this week, also after several months of silence. I assume these delays correspond to non-competes or something to that effect.

“Career Update: Incredibly fortunate and excited to be part of the founding team at Thinking Machines Lab! https://x.com/dchaplot/status/1891920016339042463

Thinking Machines Lab https://thinkingmachines.ai/

“Today, we are excited to announce Thinking Machines Lab ( https://x.com/thinkymachines/status/1891919141151572094

AI Agents: Three Big AI Automation Stories to Know About
1 – Convergence’s Proxy 1.0 is built for automated web browsing and allows users to schedule and automate routine web tasks, including logging into systems and downloading files. 2 – Microsoft’s OmniParser V2 enables any LLM to understand computer interfaces by detecting all clickable elements and mapping their functions to plain language commands. 3 – Hugging Face launched an Agent Leaderboard that ranks 17 frontier models across 14 real-world benchmarks. Unfortunately, it does not include “wrapper” tools like Convergence.

“Introducing Proxy 1.0 – the world’s most capable web-browsing agent. https://x.com/convergence_ai_/status/1892129466610073931

Introducing Our Agent Leaderboard on Hugging Face – Galileo AI https://www.galileo.ai/blog/agent-leaderboard

“Microsoft just dropped OmniParser V2, looks incredible Turning Any LLM into a Computer Use Agent https://x.com/_akhaliq/status/1890546832784208080

Google Adds Infinite Chat Memory to Gemini AI
Gemini can now reference past conversations when responding to users, allowing seamless continuation of previous discussions and topic summaries. Users maintain control with options to view, edit, or delete their chat history, marking a huge step forward in AI chat understanding the larger context of each user (no pun intended). I’m hoping that OpenAI and Anthropic enable this sort of feature sooner than later because I find Gemini to be the worst of all of the models. You’d have to pay me to use it at this point.

“LMAO, so Google just dropped infinite memory for Gemini before OpenAI did for ChatGPT. It can now recall past conversations , you can refer to something discussed a week ago. How long has OpenAI been working on this? 😆 Note: To ask Gemini to reference past chats, you need Gemini https://x.com/ai_for_success/status/1890377941579891003

“Rolling out starting today, you can ask Gemini to consider your past chats to craft its responses. Easily pick up where you left off or have it summarize a previous topic. You can view, edit, or delete any chats you’ve had with Gemini, and see when it’s used. Try it in Gemini https://x.com/GeminiApp/status/1890137961871605863

OpenAI Creates Benchmarks to Test AI Models Against Real Programming Jobs
OpenAI has created SWE-Lancer, a benchmark testing AI models against 1,400 actual software engineering tasks from Upwork worth $1 million in client payments. The benchmark examines both coding tasks (from simple bug fixes to complex features) and project management decisions. When advanced AI models attempt these real-world challenges, they fail most of them, so the benchmarks can measure improvements over time. OpenAI has made the testing framework public with the goal to measure AI’s practical impact on software development.

“OpenAI announces SWE-Lancer Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? https://x.com/_akhaliq/status/1891721712296747126

ChatGPT-4 Update Tops Performance Rankings Across Multiple Categories
OpenAI’s latest GPT-4 version shares the #1 position in six major testing categories including coding, creative writing, and handling complex conversations. Math remains an area for development.

“A new version of @OpenAI’s ChatGPT-4o is now live on Arena leaderboard! Currently tied for #1 in categories: 💠Overall 💠Creative Writing 💠Coding 💠Instruction Following 💠Longer Query 💠Multi-Turn This is a jump from #5 since the November update. Math continues to be an area https://x.com/lmarena_ai/status/1890477460380348916

AI Visuals and Charts: Week Ending February 21, 2025

“Here are the benchmark numbers: Grok 3 significantly outperforms other models in its category such as Gemini 2 Pro and GPT-4o. Even Grok-3 mini shows to be competitive. https://x.com/omarsar0/status/1891706611023938046

“Bro is driving around a 3D Gaussian Splat of a city https://x.com/bilawalsidhu/status/1891845261401501765

“Alibaba strikes again. Full-body swap anyone in a video with just a photo reference. What’s wild to me is that this tech completely bypasses the 3d pipeline (i.e. what Wonder Dynamics does to accomplish similar output) and yet looks so damn good. Basically viggle on steroids.” / X https://x.com/bilawalsidhu/status/1890535455600369687

Introducing Helix – YouTube https://www.youtube.com/watch?v=Z3yQHYNXPws

Flavio Adamo on X: “🚨 o3-mini crushed DeepSeek R1 🚨 “write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically” https://t.co/xEvPDzzbVk” / X https://x.com/flavioAd/status/1885449107436679394

“We are claiming SOTA for AI Avatar, but the ultimate test is big face. We don’t use post process or blurring hacks to hide misery. 5 videos. Same script. Generate with Argil 👇 https://x.com/BrivaelLp/status/1890435559127986463

“2. Goku+: Product and Human Interaction https://x.com/minchoi/status/1890074266244395495

x.com/_akhaliq/status/1890215479047754194 https://x.com/_akhaliq/status/1890215479047754194

“ロボット犬が襲いかかってくる展示を見て感動している。なんだこれ! ヤバすぎる!!! https://x.com/takahiroanno/status/1890350288554397709

Min Choi on X: “2. Goku+: Product and Human Interaction https://t.co/KGtK4DPxTw” / X https://x.com/minchoi/status/1890074266244395495

“A Japanese expo displayed a chained robot dog that was programmed to attack anyone who came near While the specifics of the creepy Black Mirror-like robot remain unclear, the message is evident: we need to do more on the AI safety front https://x.com/adcock_brett/status/1891177603312067056

“ByteDance presents Phantom Subject-consistent video generation via cross-modal alignment https://x.com/_akhaliq/status/1892073250974216476

Top Links of The Week – Organized by Category

AgentsCopilots

“I can now run models as good as the original ChatGPT locally on my iPhone Among other things it shows: how quickly AI is improving, the evolution of open weights models (this is Llama 3.1 8B), the opportunity for intelligence everywhere & the weakness of Siri AI compared to SoTA https://x.com/emollick/status/1891606625883955706

“OpenAI Deep Research is not built for prediction (people always ask about stock & demand forecasts) it is built for analysis: make an argument about a point of view and the evidence to support it. This is what lawyers, accountants, academics, analysts, and entrepreneurs do a lot.” / X https://x.com/emollick/status/1890063434022347257

“you can now deploy AI agents to browse the internet, automate tasks, and execute complex workflows on your behalf. this is version zero of the tech (chatgpt operator etc). imagine version ten!!!!! how are you sleeping at night!!!” / X https://x.com/gregisenberg/status/1888240832479682610

“.@TallyApp is building AI agents for accounting firms. Tally agents operate in a firm’s existing systems, automating repetitive work across accounting, tax, and audit, helping teams be more efficient. https://x.com/ycombinator/status/1889751328836006052

FinRL-DeepSeek – new trading AI agents combining Reinforcement Learning with Large Language Models — AstroWind https://melwy.com/finrl_deepseek

“we just used @Replit agent to build a browser agent 🤯 you might not want to give @OpenAI your $200 just yet… this cost $6.14 to build. here it is browsing @ProductHunt and making a list from a prompt… https://x.com/MakerThrive/status/1888998516250304900

“I built a multi-agent framework that pre-approves loan applications in under 5 minutes 🎉 1. Integrated 5 agents that make use of @Experian and @Plaid APIs for real-time credit and financial verification 2. The agents look up credit score, DTI, LTV, assets, property valuation https://x.com/n_sri_laasya/status/1890132203419627562

“Announcing Native Mobile App support on Replit. Now you can build iOS and Android apps that you can take all the way to the App Store without writing any code, powered by Replit Assistant. This is early access—full Agent support coming soon! https://x.com/amasad/status/1888727685825699874

“📈 A multi-agent Financial Analyst with #crewai 💰 The user submits a query and then: ✅ Query Parser Agent extracts structured outputs form the query using #pydantic ✅ Code Writer Agent writes #python code to visualize the stock data using #pandas, #matplotlib, and https://x.com/crewAIInc/status/1889437990214357125

“If you’re building AI Agents, this is the only podcast you need Here are my favourite insights from the latest episode of the AI Agents podcast with @kwindla and @KaranVaidya6, 1. Prioritize Low-Latency for Voice Interactions: If your AI agent uses voice, choose WebRTC over” / X https://x.com/_Prathit/status/1888612348140880145

“📽️ Just created an agent that: 1⃣ Searches @YouTube for recent popular videos that match a specified topic 2⃣Summarizes the videos 3⃣Drafts (or sends!) an email to me with the research summary. <2min to create using @ComposioHQ, Gemini 2.0, @YouTube, @GMail, and @llama_index. https://x.com/DynamicWebPaige/status/1887897486770974770

“Right now, I do not think an open Deep Research is possible – the magic seems to be in the o3 model more than the agentic use of search engines, etc The fact that the model “follows its curiosity” and delivers very low hallucination and insightful answers is an LLM quality thing” / X https://x.com/emollick/status/1889904648527683896

“DeepSeek R1 was just the start—this new Chinese research from @Kimi_Moonshot lets RAG AI agents devour entire codebases and documentation with no context limits. Mixture of Experts and Sparse attention make near-infinite context possible. 🧵1/n 📌 Challenge of Long-Context https://x.com/rohanpaul_ai/status/1892535262879617101

“Never before has there been such an accessible knowledge work agent freely available, as Perplexity Deep Research. What are some other demos you would like to see with it? Please comment here. Also, what are the pain points you’re seeing today? We will look into supporting a” / X https://x.com/AravSrinivas/status/1891344192707969134

“Microsoft silently updated OmniParser on the hub 👀 60% faster than v1 – sub-second latency on a 4090! “OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.” Bonus: you can try it https://x.com/reach_vb/status/1891467489030082875

“🔥 Just finished Unit 1 of the @HuggingFace Agents Course and wow – this thing is incredible! Crystal clear deep dive into AI Agents 🤯 Can’t recommend enough! #AI #HuggingFace https://huggingface.co/learn/agents-course/en/unit1/introduction

“🔍 How LinkedIn enhances Sales Navigator with LangChain To refine LLM-powered features like AccountIQ for sellers, @LinkedIn built collaborative prompt engineering playgrounds using Jupyter Notebooks & LangChain. In this blog post, see how they: 🔹 Automated company research” / X https://x.com/LangChainAI/status/1890531416800383074

“Last week, @NWischoff asked for an “AI agent to underwrite a mortgage or even do a partially underwritten pre-approval in less than 24 hours.” @AgentOpsAI build a fully functional agent that pre-approves loan applications in under 5 minutes. Comment below if you are interested https://x.com/AtomSilverman/status/1891965335169008133

“Microsoft presents: Magma: A Foundation Model for Multimodal AI Agents – SotA on UI navigation and robotic manipulation tasks – Pretrained on a large dataset annotated with Set-of-Mark (SoM) for action grounding and Trace-of-Mark (ToM) for action planning. https://x.com/arankomatsuzaki/status/1892059107479224384

“@Entelligence ❤️’s OSS! You can now understand any codebase with realtime documentation, tutorials, codebase chat, code reviews and more by simply changing https://x.com/Aiswarya_Sankar/status/1890502637360992451

Fiverr Go | AI-Powered Tools to Amplify Human Talent https://www.fiverr.com/go

“🛠️ I built my portfolio website using @lovable_dev https://x.com/gusgarza_/status/1888284667339575375

“LLMs will solve document recognition. They will also solve entire document workflows that traditionally requires an entire backoffice team and a week’s worth of effort to solve. 1. Contract Review 2. Invoice Processing and Reconciliation 3. Compliance Reporting LlamaCloud is https://x.com/jerryjliu0/status/1890559184372134006

Anthropic

Introducing the Anthropic Economic Index \ Anthropic https://www.anthropic.com/news/the-anthropic-economic-index

Audio

BREAKING: Grok’s voice unveiled https://x.com/teslaownersSV/status/1891719294469222495

Organizational life is about to get much weirder. This paper creates an early form of meeting delegates, where you send an AI to a meeting on your behalf, and it uses your voice and knowledge to advance your agenda A lot of old organizational methods need to be rethought for AI https://x.com/emollick/status/1891527817826828565

BusinessAI

“The inaccuracy in financial queries on Perplexity Deep Research is being addressed. Examples of current failures include using old bitcoin prices or old market caps of the companies for a task like make me an investment case for Palantir or Bitcoin. We’re fixing this. For finance” / X https://x.com/AravSrinivas/status/1891535474315149662

“.@harvey__ai is THE vanguard AI app startup 🆕 @NoPriorsPod w/ Cofounder-CEO @winstonweinberg on: *capability improvement *co-pilots, end to end task completion, and product strategy for AI apps *selling AI to enterprise *hiring philosophy *what lawyers do in 5 years links 👇 https://x.com/saranormous/status/1890437612327874751

“Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. https://x.com/OpenAI/status/1891911123517018521

“Representative survey of US workers finds that GenAI use continues to grow: 30% use GenAI at work, 1/3 of those use it every day And the productivity gains appear large: workers report that when they use AI it triples their productivity (reduces a 90 minute task to 30 minutes) https://x.com/emollick/status/1890236327582658595

“Congrats on company launch to Thinking Machines! Very strong team, a large fraction of whom were directly involved with and built the ChatGPT miracle. Wonderful people, an easy follow, and wishing the team all the best!” / X https://x.com/karpathy/status/1891938714915569711

ChipsHardware

“the grok 3 release made me sad. something fatalistic about falling back to bruteforce scaling — 100x more compute than R1 for a model that’s at most 10% better all that time, money, and electricity spent on a system that will be obsolete before my semester ends AI needs new” / X https://x.com/jxmnop/status/1892725541796446350

“NVIDIA + Arc Institute’s new model Evo 2 just demonstrated that deep learning can directly model biological function It stands as a breakthrough in computational biology, 🧵 1/n Evo 2 just redefined genomic modeling by processing over 9 trillion nucleotides to seamlessly https://x.com/rohanpaul_ai/status/1892383673887985738

EthicsLegalSecurity

“so. this will never* happen because law firms have to be owned 100% by attorneys. so it’s impossible* to make a law firm that is also a tech company that rewards non-lawyers with equity” / X https://x.com/andersonbcdefg/status/1891329620558983633

“I think this is directionally accurate tbh I am increasingly bearish on service companies (consulting/accounting/law firms) increasing their margins and becoming more akin to software companies” / X https://x.com/finbarrtimbers/status/1891308159530082315

“Many on Twitter expressed surprise at China’s rapid progress—especially with projects like DeepSeek and Qwen—but for those of us who have been closely following the latest ML research, these developments were hardly unexpected. Although these projects have attracted attention https://x.com/arankomatsuzaki/status/1891886896659517754

“Really hard to know what jobs are the best jobs in the future. We have pretty good evidence of which jobs are going to overlap most with AI (coding, yes, but also teaching, journalism, marketing, etc.) but overlap doesn’t mean destruction, it can just mean major transformation.” / X https://x.com/emollick/status/1889905489196069079

“put differently if Harvey (WLOG) is actually the future of law (which I largely believe) the natural conclusion is to launch their own law firm and dominate the vertical” / X https://x.com/finbarrtimbers/status/1891308426841461054

“As someone who is pretty good at keeping up with AI, I can barely keep up with it all. That leads me to believe that very few other people are keeping up, either. So, on one hand, don’t feel bad you aren’t on top of it all. On the other, it means no one has the whole picture now” / X https://x.com/emollick/status/1891913605890609624

Google

“Google presents: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Opensources model ckpts with four sizes from 86M to 1B https://x.com/arankomatsuzaki/status/1892777324715634971

“We are starting to see what “AI will accelerate science” actually looks like. This Google paper describes novel discoveries being made by AI working with human co-scientists (something I think we have all been waiting to see), along with an early version of an AI scientist. https://x.com/emollick/status/1892269913894420743

OpenAI

OpenAI looking at 16 states for data center campuses tied to Stargate https://www.cnbc.com/2025/02/06/openai-looking-at-16-states-for-data-center-campuses-tied-to-stargate.html

“The Guardian Media Group inks a deal with OpenAI https://x.com/fdaudens/status/1890502321047568705

Reasoning best practices – OpenAI API https://platform.openai.com/docs/guides/reasoning-best-practices

OpenSource

“Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.” / X https://x.com/elonmusk/status/1890958798841389499

“i love the janitor, but just accept that Grok-3 is the most powerful PUBLICLY AVAILABLE LLM (at least for a day lol) Look at the condition of the bet. Grok-3 delivered. https://x.com/scaling01/status/1891842735834808708

“The radical transparency here is incredible. Nobody is doing this at the level of DeepSeek” / X https://x.com/casper_hansen_/status/1892835887446159409

Robotics

Protoclone, the world’s first bipedal, musculoskeletal android. https://x.com/clonerobotics/status/1892250639360561234

“Austin-based robotics company Apptronik raised $350M, with Google in participation The company is working with Deepmind to develop embodied intelligence in ‘Apollo’ humanoid It’s being tested by Amplifier, GXO, and Mercedes-Benz https://x.com/adcock_brett/status/1891177390388191293

Meta Plans Major Investment Into AI-Powered Humanoid Robots – Bloomberg https://www.bloomberg.com/news/articles/2025-02-14/meta-plans-major-investment-into-ai-powered-humanoid-robots

TwitterXGrok

Grok3 Launch / X https://x.com/i/broadcasts/1gqGvjeBljOGB

Video

“this looks insane, snapchat just dropped Dynamic Concepts Personalization from Single Videos propose a new technique for personalizing text-to-video models, enabling them to capture, manipulate and combine Dynamic Concepts. https://x.com/_akhaliq/status/1892782271763034385

“Today we’re launching Pikaswaps: replace anything in your videos using photos you upload, or scenes you describe. The results are unbelievably believable, and the possibilities are as unlimited as your imagination. Try it at Pika dot art https://x.com/pika_labs/status/1892620122818294109

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading