About This Week’s Covers

This week’s cover is all about robots training in simulations in order to perform tasks in the real world.  Two big stories from 1x and NVIDIA remind me that the real future of AI is going to catch us all by surprise – robots everywhere!  The cover image references one of the most iconic skateboard graphics of all time, the Powell Peralta Ripper logo created by VC Johnson in 1983.  The graphic has been reimagined as a robot breaking out into the real world on a skateboard deck.  The simulations that train robots are good enough to have a robot balance on a ball on the first try, so skateboarding is a reference to the physics engines and balance.

Graphic created with Flux Dev using the prompt: “A scary humanoid robot is tearing through a jagged red paper hole on a white background. The robot has a surprised or intense expression. Its hands grip the edges of the torn background as if breaking through. The entire image has a slightly eerie and intense feel, with high contrast between the robot and the background”.  Skateboard created with Ideogram using the prompt: “Product photography. A plain skateboard deck on a white background.”  Final composite made with Photoshop.

This week’s cover images are a combination of MidJourney, Ideogram, and Flux.  Here is a quick comparison of MidJourney v. Ideogram on the same prompt.  MidJourney is absolutely horrible with complex compositions.

A businessman wearing a tie with the Microsoft logo on it.

Ideogram result:

A businessman wearing a tie with the Microsoft logo on it.
A businessman wearing a tie with the Microsoft logo on it.

Midjourney result:

A businessman wearing a tie with the Microsoft logo on it.
A businessman wearing a tie with the Microsoft logo on it.

This Week’s Executive Summaries

Apple Tests AI-Enhanced Siri for iPhone 16 with “Apple Intelligence” Features
Apple is rolling out a test version of its AI-enhanced Siri on iPhone 16, aiming to make the virtual assistant more conversational and useful. This update, part of Apple’s new “Apple Intelligence” initiative, allows Siri to automate tasks, summarize emails, proofread documents, and perform photo editing—all powered by the iPhone 16’s advanced processors. A future update will enable Siri to interact with third-party apps.  
yahoo

Before reading this Apple announcement, last weekend I wrote a post about what Apple is doing.  It’s called “Apple is pulling a Braveheart and can change the way we use phones whenever they choose”

Jony Ive and Sam Altman Join Forces to Create a New AI Device
Jony Ive, the former Apple designer behind the iPhone, is teaming up with OpenAI’s CEO, Sam Altman, to develop a new device powered by artificial intelligence. Five years after leaving Apple, Ive is working through his own design firm, LoveFrom, and has bought several buildings in San Francisco’s Jackson Square to create a collaborative space. Here, his team is developing a product that uses AI to make tech interactions less distracting and more helpful. This project, still under wraps, combines Ive’s design expertise with OpenAI’s advancements in generative AI—technology that can understand, summarize, and respond to human needs. Ive and Altman, who envision raising $1 billion for this effort, are aiming to create something that redefines how people interact with technology, making it more intuitive and human-centered.
Nytimes | 8teapi 

1X Launches “World Model” to Help Robots Learn and Predict the Real World
AI robotics company 1X has created “World Model,” a virtual simulation tool to help robots better understand and navigate their surroundings. This new model uses data from real-life settings, like homes and offices, to simulate scenarios, helping robots learn how objects will react when they’re moved, dropped, or interacted with. Unlike previous simulators that struggle with soft or flexible objects (like clothes or doors), World Model can predict interactions with more everyday items. It also solves a big problem for robotics: testing robots in the real world can be unreliable as surroundings constantly change, making it hard to see if updates are actually improving the robot’s skills. To boost development, 1X is running a “World Model Challenge,” where researchers can win cash prizes for creating more efficient simulations. The goal is to inspire new ways to train robots, making them smarter and more reliable in complex, unpredictable environments.  Robots of all kinds are the #1 “end game” that I’m following for AI as a whole, and I encourage everyone to take it seriously and read as much as you can  Here’s my archive of robot links: https://ethanbholland.com/category/ai/robotics-embodiment/ 
1x | rowancheung

My Favorite AI Robotics Researcher, Dr. Jim Fan, on the Sequoia Capital Podcast Discussing NVIDIA’s Goal to Make Robots as Common as Smartphones
On Sequoia Capital’s “Training Data” podcast, Jim Fan, a lead scientist at NVIDIA, describes the company’s vision to make intelligent, humanoid robots as widespread and useful as smartphones. Heading NVIDIA’s robotics team, Fan is developing a project designed to give robots a general understanding that lets them learn a range of skills, much like how today’s language models work with human language. To bring this vision to life, NVIDIA blends data from the internet, simulations, and real-world robot experiences, creating adaptable models capable of real-world tasks.  Thanks to NVIDIA’s graphics expertise, their simulations generate 1000s of years of synthetic practice data, preparing robots to function seamlessly in human environments – sometimes with no previous real world experience. While some might question the focus on human-shaped robots, Fan argues that a human design helps them fit naturally into our world, making them more versatile.  Dr. Fan is absolutely my favorite researcher and I recommend listening if you want to think bigger than “chat bots” and see where things are heading.
Sequoiacap

Microsoft Adds New AI Tools to Office for Collaboration and Productivity
Microsoft has rolled out updates to its AI tool, Copilot, which helps people in apps like Word, Excel, PowerPoint, and Teams. The latest feature, Copilot Pages, is a new workspace where teams can work together on AI-generated content, allowing everyone to see updates and make changes in real time. Microsoft is also upgrading Copilot across its apps: Excel can now work with Python for advanced data analysis, PowerPoint helps create presentations faster, Teams can recap both spoken and chat discussions from meetings, and Outlook organizes emails by importance. Plus, Copilot Agents now make it easier to automate routine tasks, and a new Agent Builder tool lets users create their own AI helpers in apps like SharePoint. 
Microsoft | AtomSilverman | theverge | adcock_brett

Amazon-owned Zoox to Launch Fully Autonomous Robotaxi Service in San Francisco
Amazon-owned Zoox is preparing to launch its fully autonomous, driverless passenger rides in San Francisco in the near future. The vehicles, which have no steering wheels and feature a unique, symmetrical design allowing them to drive in either direction, are currently limited to a maximum speed of 45 MPH and are restricted to city operations for the initial rollout. Unlike other autonomous services that have partnered with ride-hailing platforms, Zoox will operate its own standalone network.
Pitdesi

Three Mile Island to Reopen as Energy Source for Microsoft Data Centers
The Three Mile Island nuclear plant, known for the worst U.S. nuclear accident in 1979, is set to reopen by 2028 to supply power to Microsoft’s data centers. In partnership with Constellation Energy, the 20-year agreement will deliver reliable, carbon-free energy to support Microsoft’s cloud and AI operations, adding $16 billion to Pennsylvania’s GDP and creating thousands of jobs. Renamed the Crane Clean Energy Center, the plant aims to address increasing demand for sustainable power in tech. While nuclear energy proponents tout its consistent output and low emissions, some activists question public funding and express reservations about reviving nuclear power in the state.
Npr | constellationenergy 

Alibaba Unveils Qwen2.5, The Most Powerful Open-Source Model in The World
Alibaba has launched Qwen2.5, an expansive family of open-source AI models, including versions tailored for general use, coding, and advanced math.  The models also come in a range of sizes—from 0.5B to 72B parameters—offering developers substantial flexibility at a fraction of the cost of proprietary models. Qwen supports extensive multilingual capabilities, spanning 29 languages(!), and handles complex tasks like JSON output generation, long-text comprehension, and structured data analysis with high efficiency. Benchmark results reveal that Qwen2.5 and its variants outperform many larger models in coding and math-specific tasks, highlighting the progress of small language models (SLMs). 
Adcock_brett | bindureddy | _philschmid | Alibaba_Qwen | rohanpaul_ai | Alibaba_Qwen | github

AI Visuals and Charts: Week Ending 09/20/2024

Kling Introduces “Motion Brush” – must see video

“Kling AI def nailed it with their motion brush implementation — a good way to exert control and tame the chaos / slot machine nature of video diffusion models. The UX reminds me of DragGAN, but better thanks to the segmentation. Hope all other AI video tools learn from this.” / X

“Prompt: [Soldier stands up and walk to the left] This is insane! 

A compelling animation of the various AI labs as they compete for the top of the leaderboard over time

 “Video showing race of AI labs showing top ranked models from each lab from @lmsysorg. OpenAI leads by over 50 points, something we haven’t seen since March 2024. CC: @altryne @aidan_mclau @btibor91 @BorisMPower @Scobleizer @MatthewBerman @8teAPi 

8bit_e is an incredible AI video creator with tutorials – these are must see!

“A little experiment I worked on using LivePortrait, Viggle, and ComfyUI. Will be doing a breakdown soon! 

“BTS to my latest video. 

Top 80 Links of The Week – Organized by Category

Agents and Copilots 

“🚨 OpenAI CEO Sam Altman confirms that Level-3 Agents are coming soon ” The shift to level 2 took time, but it accelerates the development of level 3. This will enable impactful agent-based experiences that will greatly impact technology advancements in technology ” 

“.@OpenAI is hiring ML engineers for a new multi-agent research team! We view multi-agent as a path to even better AI reasoning. Prior multi-agent experience isn’t needed. If you’d like to research this area with @kevinleestone and me fill out this form: 

“o1-preview is the first model to actually pull off a full sestina. Every other model has failed, though Claude 3.5 gets so close before flubbing at the envoi (the pattern of the last three lines). Also the summarized list of thoughts when you ask o1 to write poetry is something. 

“The “thinking” process in o1 is just a brilliant bit of user experience. Though it may or may not represent the actual chain of thought, it makes the oracular answer at the end feel understandable and explainable (even if it doesn’t contain actual information?) 

“In the past few days, I’ve been testing OpenAI o1 models, mostly o1-mini, for developing PhD or postdoc level projects. I can confidently claim that the o1 model is comparable to an outstanding PhD student in biomedical sciences! I’d rate it among the best PhDs I’ve have trained! 

“The confusion and skepticism from technologists about o1 is remarkably similar to the early response to GPT3 and ChatGPT. “That’s all it does?” “It’s just ….” “How is this different from …?” “Don’t really get it” Meanwhile a small number of people are whispering about wild” / X

“🚨 OpenAI CEO Sam Altman confirms that Level-3 Agents are coming soon ” The shift to level 2 took time, but it accelerates the development of level 3. This will enable impactful agent-based experiences that will greatly impact technology advancements in technology ” 

“Is OpenAI’s o1 a good calculator? We tested it on up to 20×20 multiplication—o1 solves up to 9×9 multiplication with decent accuracy, while gpt-4o struggles beyond 4×4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 

“After spending a decent amount of time with o1-preview, I would be very surprised if it is not able to do economically valuable analytical work inside large companies. The main issue is that prompting it remains really weird. But a real R&D effort inside firms might crack that.” / X

“No more waiting. o1’s is officially on Chatbot Arena! We tested o1-preview and mini with 6K+ community votes. 🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance! 🥈o1-mini: #1 in technical areas, #2 overall. 

“The new Siri setup in iOS 18.1 Beta looks so satisfying to watch. 

Apple begins testing AI software designed to bring a smarter Siri to the iPhone 16

AGI (Artificial General Intelligence) 

“Love this diagram outlining the progressive frontiers of human knowledge, culminating in a PhD. Now if I had to visualize AGI, it would be that PhD dent below, except on every point of the sphere 🌐➡️🔴 

“Stuart Russell says the future of artificial general intelligence is more likely to be 100s of millions of robots controlled by one giant global brain than each robot having its own independent brain 

Amazon

“Just launched 🚀Amazon’s new GenAI-powered assistant for sellers on our retail site, codenamed Project Amelia! I’m excited for our latest GenAI tool that is going to make it even easier for sellers to manage and grow their business. Using the power of Amazon Bedrock, Project 

Amazon introduces Amelia, an AI assistant for third-party sellers

Audio

“Google’s NotebookLM is the current best “wow this is amazing & useful” demo of AI Here I gave it the entire text of my book, it turned it into a podcast, a study guide, FAQ, timeline & quite accurate chat Listen to the first few minutes of the “podcast.” Seriously, just listen. 

Augmented and Virtual Reality (AR/VR) 

“Okay it happened! Snapchat Spectacles AR glasses. Fully standalone. 46 degree field of view. 37 pixels per degree. That’s roughly like a 100” TV screen! 2x snapdragon chips. 45 minutes of battery. Auto transitioning lenses. Designed for co-presence. Spectator mode and more. 

Spectacles

“Dive deeper into the things you love, closer to the people you love, with #Spectacles. Join the Spectacles Developer Program at 

“See the world through a different Lens. 👓 At today’s Snap Partner Summit, we introduced the fifth generation of @Spectacles, powered by Snap OS, our brand-new, groundbreaking operating system designed to enhance how people naturally interact with the world. Learn more: 

Chips, Hardware, and Infrastructure 

“Groq partnered with Aramco to build “the world’s largest AI inference center” The center will initially house 19,000 LPUs but eventually grow to 200,000 in total The data center will reportedly cost over $100 million, funded by Aramco 

Chip Startup Groq Backs Saudi AI Ambitions With Aramco Deal – Bloomberg

“Foundation models have shaken up the digital world… but what about robotics? On this week’s Training Data, @stephzhan and I spoke to @DrJimFan about @Nvidia’s Project GR00T and his quest to build a generalist “foundation agent” that transcends skills, forms, and realities. 

NVIDIA AI Aerial Launches to Optimize Wireless Networks, Deliver New Generative AI Experiences on One Platform | NVIDIA Blog

Ethics/Legal/Security 

Whistles, songs, boings, and biotwangs: Recognizing whale vocalizations with AI

“Breaking reCAPTCHAv2 Solvees 100% of the CAPTCHAs presented by reCAPTCHAv2, surpassing the success rates of previous works, which range from 68% to 71% repo: 

Snapchat’s AI selfie feature puts your face in personalized ads — here’s how to turn it off – The Verge

LinkedIn is training AI models on your data – The Verge

China to require labels for AI-generated content as tech brings fresh challenges | South China Morning Post

“Elsewhere on the front, soldiers from the FPV bombing group are launching smaller drones to hit Russians soldiers who have attempted to close in on Ukrainian positions and begun digging in. Yevhenii, whose callsign is “Bird”, puts on his immersive goggles and takes control. “We 

“The voice of Darth Vader, James Earl Jones, sadly passed away recently. But before he did, he agreed to let AI replicate Darth Vader’s voice, helping the character continue in future Star Wars productions. Now, we’ll forever have this iconic voice: 

James Earl Jones’ controversial AI decision will let Darth Vader live on, but it raises concerns among actors | Fox News

“If you do your thinking in English vs Hindi vs Chinese, I wonder how that influences your chain of thought 

UN advisory body makes seven recommendations for governing AI | Reuters

US to convene global AI safety summit in November | Reuters

“The Safety and Security Committee—a committee established to review critical safety and security issues—has made recommendations across five key areas, which we are adopting. 

An update on our safety & security practices | OpenAI

OpenAI is launching an ‘independent’ safety board that can stop its model releases – The Verge

Google 

“Google’s new AI model can identify vocalizations from 8 whale species, including the mysterious “Biotwang.” It classifies 12 call types across a wide acoustic range, and the model is available for free on Kaggle AI’s reach is extending to the depths of the ocean! 

International

Middle Eastern funds plowing billions into the hottest AI start-ups

Chip Startup Groq Backs Saudi AI Ambitions With Aramco Deal – Bloomberg – https://www.bloomberg.com/news/articles/2024-09-16/chip-startup-groq-backs-saudi-ai-ambitions-with-aramco-deal

TikTok owner ByteDance taps TSMC to make its own AI GPUs to stop relying on Nvidia — the company has reportedly spent over $2 billion on Nvidia AI GPUs | Tom’s Hardware

Microsoft

“Microsoft introduced agents coming to Microsoft 365 Copilot The agents, inspired by GPTs, work in Teams, Windows, Edge, Word, SharePoint, or a website It’s becoming increasingly clear agents are the future of boring tech work 

Copilot Pages is Microsoft’s new collaborative AI playground for businesses – The Verge

“Great launch @Microsoft With Copilot Studio, you can create and publish Copilot agents to all kinds of channels, including Microsoft Teams, websites, or mobile apps. They have already seen 50,000 organizations use Copilot Studio. 

“Microsoft, BlackRock and the UAE have formed GAIIP – the Global Artificial Intelligence Infrastructure Investment Partnership – to invest $100 billion in datacenters and power infrastructure, mainly inside the United States. 

“Speaking of data centers, Microsoft and Blackrock are raising $30 billion to invest in new and existing AI data centers The investment could total $100 billion if the demand for AI infrastructure warrants it Safe to say the AI “bubble” isn’t bursting yet 

“Microsoft and BlackRock are part of a group of companies collaborating to pull together billions of dollars to develop data centers for artificial intelligence and the energy infrastructure to power them. Read more: 

Mobile 

“The new Siri setup in iOS 18.1 Beta looks so satisfying to watch. 

Apple begins testing AI software designed to bring a smarter Siri to the iPhone 16

NVIDIA AI Aerial Launches to Optimize Wireless Networks, Deliver New Generative AI Experiences on One Platform | NVIDIA Blog

OpenAI 

“When an expert realizes that o1 could write his PhD code (that took him a year) in 1 hour. Video: 

“Open Dataset release by @OpenAI! 👀 OpenAI just released a Multilingual Massive Multitask Language Understanding (MMMLU) dataset on @huggingface! 🌍 MMLU test set available in 14 languages, including Arabic, German, Spanish, French,…. 🧠 Covers 57 categories from elementary to 

“In the past few days, I’ve been testing OpenAI o1 models, mostly o1-mini, for developing PhD or postdoc level projects. I can confidently claim that the o1 model is comparable to an outstanding PhD student in biomedical sciences! I’d rate it among the best PhDs I’ve have trained! 

“.@OpenAI is hiring ML engineers for a new multi-agent research team! We view multi-agent as a path to even better AI reasoning. Prior multi-agent experience isn’t needed. If you’d like to research this area with @kevinleestone and me fill out this form: 

“o1-preview is the first model to actually pull off a full sestina. Every other model has failed, though Claude 3.5 gets so close before flubbing at the envoi (the pattern of the last three lines). Also the summarized list of thoughts when you ask o1 to write poetry is something. 

“The “thinking” process in o1 is just a brilliant bit of user experience. Though it may or may not represent the actual chain of thought, it makes the oracular answer at the end feel understandable and explainable (even if it doesn’t contain actual information?) 

“The confusion and skepticism from technologists about o1 is remarkably similar to the early response to GPT3 and ChatGPT. “That’s all it does?” “It’s just ….” “How is this different from …?” “Don’t really get it” Meanwhile a small number of people are whispering about wild” / X

“Is OpenAI’s o1 a good calculator? We tested it on up to 20×20 multiplication—o1 solves up to 9×9 multiplication with decent accuracy, while gpt-4o struggles beyond 4×4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4 

“After spending a decent amount of time with o1-preview, I would be very surprised if it is not able to do economically valuable analytical work inside large companies. The main issue is that prompting it remains really weird. But a real R&D effort inside firms might crack that.” / X

“No more waiting. o1’s is officially on Chatbot Arena! We tested o1-preview and mini with 6K+ community votes. 🥇o1-preview: #1 across the board, especially in Math, Hard Prompts, and Coding. A huge leap in technical performance! 🥈o1-mini: #1 in technical areas, #2 overall. 

Podcasts/YouTube/Op-Eds

“You can tell the RL is done properly when the models cease to speak English in their chain of thought” / X

Publishing 

“🚀Your AI toolkit just got a major upgrade! I updated the Journalists on @huggingface community’s collection with tools for investigative work, content creation, and data analysis. New additions include: – @dangerscarf’s excellent 6-part video series on AI for investigative 

“Great project by @rgibbs during in the AI journalism Lab @newmarkjschool: 1. take an llm to classify news articles according to the user needs wheel, 2. suggest other potential angles in other areas of the wheel #ona24 

“🛠️ @fdaudens, @huggingface: “Open-Source AI x Journalism: How to Leverage Hugging Face (Intermediate Level)” 

“Thrilled to see CUNY launching two new AI Journalism Labs in 2025! 🚀 As a coach in this year’s lab, I can say firsthand: it’s A+ for developing AI skills in journalism. – Best speakers, great group discussions – Focus on experimentation & prototyping It truly empowers 

“How The Economist is using AI to extend its global reach—Great interview with @econoscribe My takeaway: being strategic and focusing on products rather than just the tech. 

Robotics and Embodiment

“1X just developed an AI-powered virtual simulator for robots called ‘World Model’ It can predict complex object interactions and imagine multiple future scenarios to help robots better navigate the real world. Dishes and laundry will soon be chores of the past (!) 

“Foundation models have shaken up the digital world… but what about robotics? On this week’s Training Data, @stephzhan and I spoke to @DrJimFan about @Nvidia’s Project GR00T and his quest to build a generalist “foundation agent” that transcends skills, forms, and realities. 

Science and Medicine

“The Blindsight device from Neuralink will enable even those who have lost both eyes and their optic nerve to see. Provided the visual cortex is intact, it will even enable those who have been blind from birth to see for the first time. To set expectations correctly, the vision 

“We have received Breakthrough Device Designation from the FDA for Blindsight. Join us in our quest to bring back sight to those who have lost it. Apply to our Patient Registry and openings on our career page 

Video 

“Woah. Snapchat just launched an AI video generator! Plus, AI portrait generation that needs just one photo of you. Snaps are about to get a lot more expressive. 

“CogVideoX image-to-video is really good for timelapse videos 

“Adobe has just previewed its new Firefly Video Model, set to revolutionize video editing in software like Premiere Pro. Available in beta later this year, this tool promises enhanced workflows, allowing editors to experiment, fill gaps, and even add new elements seamlessly. 

“Kling AI def nailed it with their motion brush implementation — a good way to exert control and tame the chaos / slot machine nature of video diffusion models. The UX reminds me of DragGAN, but better thanks to the segmentation. Hope all other AI video tools learn from this.” / X

“Prompt: [Soldier stands up and walk to the left] This is insane! 

“Runway’s new video-to-video AI is amazing for reskinning classic video game cut scenes. Here’s N64 Golden Eye remastered. Wish we could pass in an object id / semantic segmentation pass to get the finer details right e.g. Bond’s gun NOT catching fire 😂 

“Gen-3 Alpha Video to Video is now available on web for all paid plans. Video to Video represents a new control mechanism for precise movement, expressiveness and intent within generations. To use Video to Video, simply upload your input video, prompt in any aesthetic direction 

Runway News | Runway Partners with Lionsgate

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading