This week’s cover image represents the spark of two AIs talking to one another. We saw this in action when OpenAI connected two AI agents via video call. Each AI was able to describe the room to the other, ask questions, and learn and explore each other’s environments. The image prompt is “A dark, starry sky with two AI agents formed from constellations, their hands reaching out towards each other with a bright spark of light between them.” –ar 5:3 –style raw.” The font is Colfax, representing the OpenAI logo font.
I decided to try a theme with this week’s cover imagery to see how creative MidJourney could be with simple prompts. Each category cover image is a name tag + art style. It was pretty neat to see the variances. The goal is not perfection. By posting the mistakes, we’ll get to see how imagery improves over time.
Executive Summary
Google and Open AI duked it out this week: Google I/O unveiled a TON of updates for their family of Gemini models. All of the updates are below in the links of the week section. However, OpenAI knew Google’s event was this week, and they flanked Google’s conference with spectacular product updates before and after. This was the most product packed week in the 33 weeks that I’ve been doing this. All of the features and links are in the top links section. This is more of the high-level must know news as the executive summary section.
Google Earned the Must-See video of the week – Google I/O Unveils Multimodal AI Assistant: Google I/O featured a remarkable real-time demo of a multimodal AI assistant. This means it can see and interact with your environment as if it were a very smart friend on Facetime. The video shows an AI assistant not only describing the scene via video chat, the AI remembers where an apple and reading glasses were previously in the conversation (incredible context window). “We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. Watch it in action, with two parts – each was captured in a single take, in real time.” @GoogleDeepMind
OpenAI Outshined Google: Here are the Key OpenAI Developments
- ChatGPT in Meetings: “ChatGPT Desktop participating in a Zoom meeting keeps track of different speakers including names & summarizes conversation. When can we delegate the entire meeting to the bot?” @8teAPi
- Voice/Video/Text Model Advancements: “The voice/video/text model from OpenAI GPT-4o can now act as a senior-pro pair programmer for every person on Earth. It’s not just ‘programming’ but any job on the PC. It can see the camera stream via glasses/devices, so it’s pretty much any existing job.” @johnrushx
- Interactive Turing Test Achievement: “The first robust empirical demonstration that any artificial system passes an interactive 2-player Turing test. GPT-4 was judged to be human by other humans 54% of the time (though humans were judged to be human 67% of the time).” @emollick
- Real-Time Student Assistance: “A student shares their iPad screen with the new ChatGPT + GPT-4o, and the AI speaks with them and helps them learn in real-time. Imagine giving this to every student in the world.” @mckaywrigley
- AI Singing Demo: “This demo of two GPT-4o’s singing to each other is one of the craziest things I’ve ever seen.” @mattshumer_
- Performance and Capabilities: “GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot. Here’s how it’s been doing.” @LiamFedus “People are misunderstanding GPT-4o. It isn’t a text model with a voice or image attachment. It’s a natively multimodal token in, multimodal token out model. You want it to talk fast? Just prompt it to. Need to translate into whale noises? Just use few-shot examples.” @willdepue
- Custom GPT Store: OpenAI’s custom GPT Store is now open to all for free. The Verge
- Executive Departures: “After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati.” @ilyasut “I’m leaving @OpenAI after 3½ yrs. I’ll be joining my good friend Andy Barry (Boston Dynamics) + @peteflorence & @andyzeng_ (DeepMind) on a brand new initiative! I think this will be necessary to fully realize AGI in the world and am excited to share more about it soon.” @E0M
- Concerns Over AGI Safety: “OpenAI co-founder Ilya Sutskever announced that he is leaving the company. This follows months of speculation of Sutskever’s role from the November 2023 Sam Altman ousting. Alongside him, superalignment group co-lead Jan Leike announced his departure.” @rowancheung “OpenAI’s safety experts keep JUMPING SHIP, citing extreme P(doom) and lack of confidence that humanity will survive AGI. CEOs like Sam Altman, saying risk is manageable, are hiding the truth. The truth is that their own top safety experts have freaked out about P(doom) and quit!” @liron
Revolutionizing Education with OpenAI’s GPT-4o: “This demo is insane. A student shares their iPad screen with the new ChatGPT + GPT-4o, and the AI speaks with them and helps them learn in realtime. Imagine giving this to every student in the world.” GPT-4o (“o” for “omni”) is advancing towards more natural human-computer interaction, available for free users and via the API. Despite needing some improvements to enhance its tutoring capabilities, the potential transformation in education is undeniable. Via @mckaywrigley, @AlphaSignalAI, @emollick
Google Pressing Forward with AI Chips and Hardware: “Trillium, Google’s latest generation of TPUs, offers a 4.7x improvement in compute performance per chip compared to the previous TPU v5e, as announced at #GoogleIO”. @Google
Apple-OpenAI Deal Finalized: “Apparently, the Apple-OpenAI deal just closed! One day before the voice assistant announcement. The new Siri will be from OpenAI” @bindureddy.
Google’s New Video Model, Veo: “Introducing Veo: our most capable generative video model. It can create high-quality, 1080p clips that can go beyond 60 seconds. From photorealism to surrealism and animation, it can tackle a range of cinematic styles.” @Google DeepMind
Instagram’s Mike Krieger Joins Anthropic as Chief Product Officer: Instagram co-founder Mike Krieger has announced his new role as Chief Product Officer at Anthropic AI. via @mikeyk and The Verge
ElevenLabs Dubbing API Launch: “We’re excited to launch the ElevenLabs Dubbing API, enabling any developer to add audio or video translation to their product while preserving the unique characteristics of the original speaker’s voices.” @elevenlabsio
Autonomous Vehicles: Waymo’s robotaxis are now providing 50,000 paid trips weekly, marking significant progress in autonomous vehicle adoption. Engadget
Meta’s AI Headphones Initiative: “Meta is reportedly building headphone-based AI hardware, codenamed ‘Camerabuds,’ with two outward-facing cameras to detect surroundings and provide real-time AI assistance. The best form factors aren’t intrusive—will people wear headphones all the time? A worthy experiment.” @pitdesi
A handful of publishing updates:
- Ad Targeting on X: “Advertisers can now reach users on X simply by describing who they want to target.” @xDaily
- X Launches AI-Generated News Summaries: X is using Grok to publish AI-generated news summaries. Engadget
- AI-Generated Search Results on TikTok: TikTok is testing AI-generated search results. The Verge
- OpenAI and Reddit Partnership: OpenAI strikes a deal with Reddit to train its AI on user posts. The Verge
Robotics demonstration videos worth watching:
- Boston Dynamics Spot at The Robot Challenge Course 2024:
- “#ICRA2024 Quadruped Robot Challenges(四脚ロボットチャレンジ)に Boston Dynamics の Spot が登場。リモコン操縦。” @ystk_hara
- Unitree G1 Humanoid Agent:
- “Unitree Introducing | Unitree G1 Humanoid Agent | AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement angle, 23~34 joints) Force control of dexterous hands, manipulation of all things Imitation & reinforcement learning driven #Unitree #AI” @UnitreeRobotics
Top 58 Links of The Week
- Apple
- Apple’s New ChatGPT Deal—Here’s What It Means For iPhone Security
- iOS 18: Apple finalizing deal to bring ChatGPT to iPhone – 9to5Mac
- Apple Closes in on Deal With OpenAI to Put ChatGPT on iPhone – Bloomberg
- Apple Launches New Accessibility Features Along with Eye Tracking Integration for iOS, iPadOS | Tech Times
- Apple announces new accessibility features, including Eye Tracking – Apple
- Audio
- 40,000 AI-narrated audiobooks flood Audible, dividing authors and listeners | TechSpot
- “Together with @YouTube, we’ve been building Music AI Sandbox, a suite of AI tools to transform how music can be created. 🎵 To help us design and test them, we’ve been working closely with musicians, songwriters and producers. ↓ #GoogleIO
- Augmented and Virtual Reality (AR/VR)
- “Spatial Intelligence is a critical piece of the AI puzzle. This is my 2024 TED talk about the journey from evolution to AI, on how we build Spatial Intelligence. “Sight turned into insight; Seeing became understanding; Understanding led to action. All these gave rise to”
- Fei-Fei Li: With spatial intelligence, AI will understand the real world | TED Talk
- Business and Enterprise
- “In 2023, about 25 percent of Alaska flights used an AI system to shave a few minutes off flight times. Those efficiencies added up to about 41,000 minutes of flying time and half a million gallons of fuel saved
- Verizon unveils new AI tools to transform customer experience | News Release | Verizon
- Ethics/Legal/Security
- Sam Altman’s eye-scanning Worldcoin banned in Spain | Reuters
- Worldcoin booms in Argentina amid 288% inflation – Rest of World
- 2024 Gen Z and Millennial Survey: Living and working with purpose in a transforming world
- Bumble Founder Says AI Could Date for You – Business Insider
- Prepare to Get Manipulated by Emotionally Expressive Chatbots | WIRED
- Google Gemini
- “Today, we’re excited to introduce a new Gemini model: 1.5 Flash. ⚡ It’s a lighter weight model compared to 1.5 Pro and optimized for tasks where low latency and cost matter – like chat applications, extracting data from long documents and more. #GoogleIO (plus a huge context window). Flash has a one-million-token context window by default, which means you can process one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.
- Google is building its Gemini Nano AI model into Chrome on the desktop. At the Google I/O 2024 developer conference on Tuesday, Google announced that it is building Gemini Nano, the smallest of its AI models, directly into the Chrome desktop client, starting with Chrome 126.
- “Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra.
- “One other thing in the updated Gemini 1.5 Pro report: we show how a research model that is a mathematics-specialized version of 1.5 Pro achieves a record score of 91.1% on the MATH benchmark (the SOTA just 3 years ago, in May, 2021 was 6.9%!).”
- “Introducing LearnLM: our new family of models based on Gemini and fine-tuned for learning. LearnLM applies educational research to make our products — like Search, Gemini and YouTube — more personal, active and engaging for learners. #GoogleIO
- “Education is one of the areas in which LLMs can do the most immediate good, even with their limitations, so I was excited to see that Google is fine tuning a tutor LLM. Also, the comparison they used was Gemini 1.0 running a variation of our tutor prompt! The prompt alone did ok
- “Whether you need a yoga bestie or calculus tutor, in the coming months you’ll be able to customize Gemini, saving time when you have specific ways you interact with Gemini again and again. We’re calling these Gems. #GoogleIO
- Google I/O additional news
- Google I/O 2024: Here’s everything Google just announced | TechCrunch
- Google I/O 2024: New generative AI experiences in Search
- “Coming soon, we’ll bring new multi-step reasoning capabilities to Google Search. It breaks your bigger question down into parts and figures out which problems to solve and in what order, so research that might’ve taken you minutes or even hours can be done in seconds. #GoogleIO
- “Trillium is our latest generation of TPUs and delivers a 4.7x improvement in compute performance per chip over the previous generation, TPU v5e. #GoogleIO”
- “This summer, we’re expanding Gemini’s multimodal capabilities — including the ability to have an in-depth two-way conversation using your voice. This new experience is called Live. #GoogleIO
- “Sir Demis Hassabis just showed a super low latency demo of Google’s multimodal AI assistant on your phone AND augmented reality glasses. Clearly they’ve been cooking this for a while. The race is on!
- “Google is doing something very interesting by building specialized versions of its frontier models for math, healthcare, and education (so far). The benchmarks on all of these are pretty impressive, and it seems to be beyond what can be done with traditional fine tuning alone.”
- Imagery News
- Google (Imagen): “We’re introducing Imagen 3: our highest quality text-to-image generation model yet. 🎨 It produces visuals with incredible detail, realistic lighting and fewer distracting artifacts. From quick sketches to very high-res imagery, here’s a look at what it can create. 👀 #GoogleIO
- MidJourney: “We’re now testing ‘private creation rooms’ on our website! For all MJ members with >100 images, just click ‘rooms’ and then ‘create room’. This lets you make private spaces to create and explore with friends or coworkers. Have fun and let us know what you think!”
- International
- “Not enough people are talking about the fact that OpenAI FINALLY tokenizes different languages better! I classified all the tokens on ‘o200_base’, the new tokenizer for GPT-4o and at least 25% of the tokens are in different languages. No more spending 4x for non-English!
- “2x cheaper & faster is for English, but for other languages (especially non-latin-script) expect – thanks to our new tokenizer — even up to 9x cheaper/faster!”
- Mobile
- I/O 2024: New ways to experience Google AI on Android
- OpenAI
- “OpenAI desktop app deeply integrated into my day to day environment. Not sure if I’ll ever need to rely on search by default. This is pretty existential for search engines.”
- “GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction. It will be available for free users and via the API. If you enjoyed this, check out
- “the first time gpt-4o spoke back to me in real-time, it became clear that we built something completely new – and that what we are building is the future of human-computer interaction. come build this real-time future with us.
- “In case you missed it somehow, OpenAI unveiled GPT-4o. It’s a new advanced multimodal model that integrates text, vision, and audio processing and is free for ALL users. I did a thread here on all the most incredible use cases it’s unlocked so far:
- “So @BeMyEyes has been privately playing with advanced access to @OpenAI’s new GPT-4o model. It’s pretty awesome and here is some video proof. Thank you to the OpenAI team including @JessicaShieh, for your partnership.
- Hello GPT-4o | OpenAI
- Test Driving ChatGPT-4o (Part 2)
- “A new version of the the most common AI benchmark, MMLU, was just released, with a bunch of improvements, like increasing the number of multiple choice answers from 4 to 10, and adding more reasoning questions. It seems to be a better test. GPT-4o looks like a big improvement.
- “Underappreciated technical improvement in GPT-4o is that it is no longer lazy at all. It produces a ton of work and doesn’t dodge commands. We have been running some experiments and it is like having the old March version of GPT-4 back.”
- “The speed and extra coding oomph of GPT-4o make it really powerful at analysis compared to GPT-4. “Analyze this. Visualize it. Do sophisticated analysis” Given a dataset of superheroes and no other context, it does really impressive visualization, PCA, clustering analysis…
- “The real-time audio/video in, audio out in gpt-4o is sick HUGE step change in UXs. more and more people are going to be talking to their AI”
- “OpenAI seems to be working on having phone calls inside of chatGPT. This is probably going to be a small part of the event announced on Monday. (1/n)
- Jim Fan walks through the highlights of the OpenAI announcements.
- “4D Chess: Seems OpenAI planned all along to sandwich Google I/O announcements with releases. Step 1: Release solid multi modal model day before Google I/O to take the steam out of their presentation. Step 2: Clean up any Google I/O wins with June GPT-5 release. ———— GPT-5 is”
- “sam altman is a genius master class strategist—he used the enemy of my enemy principle to perfection. 1) he neutralized elon threat completely. 2) negotiated an incredible deal with satya for infinite compute & forever customer. 3) now negotiated a deal with apple to make openai
- Publishing
- “This journalism professor made a NYC chatbot in minutes. It actually worked.
- OpenAI and Reddit Partnership | OpenAI
- Robotics and Embodiment
- Video: $16k G1 humanoid rises up to smash nuts, twist and twirl
- “Unitree G1 Humanoid Price from $16K G1 has 23-43 joints and offers extra large joint angle limits Notably, it has really usable 3 fingered hands in addition to anthromoporphic hands with Force control (in some variants).
- Science and Medicine
- PRIME Study Progress Update — User Experience | Blog | Neuralink
- Rad AI, a startup that helps radiologists save time on report generation, raises $50M Series B from Khosla Ventures | TechCrunch
- Scientists use generative AI to answer complex questions in physics | MIT News | Massachusetts Institute of Technology
- “Sample of “1.5 million quasi-random assignments in US military emergency departments.” patients who outranked physicians enjoyed more physician effort and better health outcomes because more resources were inequitably invested on them.”
The Rest: AI News of The Week
Don’t let the volume overwhelm you. Have fun and skim these. The links are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline. I’ve added a description and glossary of what the topics mean, beneath each label, in plain language. I do the work so you don’t have to! When you visit the pages, note that the links and descriptions are often pulled directly from tweets or articles, so it’s not always my voice. Pause when you see something that interests you. Reach out to me any time. I enjoy sharing and discussing these items.
Agency/Agents/Copilots News of the Week: Agency is when AI can do things for you (like Googling an actress name or fetching the latest weather forecast). An agent is one step further, when AI given autonomy to take action on your behalf (“Alexa, book a reservation for three at Peak in Hudson Yards for Friday night”). A co-pilot is an assistant (like spell check or autofill).
This week’s latest agent news: https://ethanbholland.com/2024/05/17/agents-and-copilots-ai-news-week-ending-05-17-2024/
Amazon News of The Week: Individual company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This week’s latest Amazon AI news: https://ethanbholland.com/2024/05/17/amazon-ai-news-week-ending-05-17-2024/
Anthropic News of the Week:
Anthropic is a company that builds LLMs like OpenAI, Mistral, Meta, etc. Their main AI brand is Claude. As with Amazon and Apple, individual Anthropic company posts will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This week’s Anthropic news: https://ethanbholland.com/2024/05/17/anthropic-news-week-ending-05-17-2024/
Apple News of the Week: As with Amazon, individual Apple company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This weeks’ latest Apple AI news: https://ethanbholland.com/2024/05/17/apple-ai-news-week-ending-05-17-2024/
Artificial General Intelligence (AGI) News of the Week: Artificial General Intelligence, in a nutshell, is when artificial intelligence is able to beat humans at everything (including embodying physical forms and completing physical tasks). It’s usually a thought catalyst for predictions, like when AGI will occur. 10 years? 25 years? 100? AGI is an event horizon that is tough to define, tough to imagine, and tough to predict. OpenAI defined AGI in its charter as “highly autonomous systems that outperform humans at most economically valuable work”. OpenAI has a section of its website dedicated to AGI. Google’s DeepMind published my favorite report on the five levels of artificial intelligence on the way to AGI (see also here).
This week’s latest Artificial General Intelligence (AGI) news: https://ethanbholland.com/2024/05/17/artificial-general-intelligence-agi-news-week-ending-05-17-2024/
AI Audio News of the Week: In this case, AI audio can mean a few things. The first is “generative audio” which refers to creating sounds with AI, much like ChatGPT writes words or MidJourney creates images. For example, asking for the “sound of waves crashing on the beach” would be text to sound. Another example would be an AI ‘watching’ a video and adding sound to it, like a foley artist would add footsteps or a creaking door to a movie scene. Lastly, AI audio can refer to microphones that only pick up certain speaker’s voices or headsets that cancel out all voices but your friends. This week’s latest AI audio news: https://ethanbholland.com/2024/05/17/audio-news-week-ending-05-17-2024/
Autonomous Vehicles/Driverless Cars News of the Week: Driverless car news doesn’t always get its own category, because it’s so close to robot embodiment. I go with my gut each week around what to place in each category. My recommendation would be to follow Robotics/Embodiment also, as the two fields are converging.
This week’s autonomous vehicle news: https://ethanbholland.com/2024/05/17/autonomous-vehicles-news-week-ending-05-17-2024/
Augmented and Virtual Reality (AR/VR) News of the Week: Augmented reality is when you see images or information on top of the real world. A car windshield with a heads-up display of the speed. Or glasses that have facial recognition and overlay the names of everyone in view. Virtual reality is when you are transported into another place, usually wearing goggles, but a flight simulator could also be considered virtual reality.
This week’s latest AR/VR news: https://ethanbholland.com/2024/05/17/augmented-and-virtual-reality-ar-vr-news-week-ending-05-17-2024/
Business/Enterprise News of the Week: This broad category is for stories that impact corporations and large scale AI implementation. Enterprise refers to a type of AI that is often custom built for a business or leverage an API to connect secure data to an AI model.
This week’s latest enterprise AI news: https://ethanbholland.com/2024/05/17/business-and-enterprise-ai-news-week-ending-05-17-2024/
Chips and Hardware AI News of the Week: Most of the chip news is NVIDA usually, yet more and more Meta, Google, and OpenAI are starting toward their own manufacturing. I have to make the call whether to put Meta, Google, and OpenAI’s chip news under this section or their company sections. Lately, I’m putting each company’s chips news into the company category, rather than the chips category. This is the rest of the chips headlines.
This week’s latest chips and hardware news: https://ethanbholland.com/2024/05/17/chips-and-hardware-week-ending-05-17-2024/
Consumer Electronics AI News of the Week: This is a broad category meant to capture end user tools and products that incorporate artificial into their feature, from high-end grills to smartphones.
This week’s latest consumer AI news: https://ethanbholland.com/2024/05/17/consumer-products-week-ending-05-17-2024/
Education AI News of the Week: There is a lot of buzz around the impact of AI in education. This section focuses both on the risks and rewards of how AI can impact learning. It’s broader than just K-12 and includes things like skills, trade, professional, and higher education. This is not about how to learn AI, it’s about AI’s impact on learning.
This week’s latest education news: https://ethanbholland.com/2024/05/17/education-ai-news-week-ending-05-17-2024/
Ethics/Legal/Security AI News of the Week: This section focuses on the impact AI is having on ethics (deep fakes, war, trust, false information, plagiarism, job loss, income), legal (rights, laws, regulations), and security (hacking, phishing, national interests, safety). For huge news stories like the NY Times suing OpenAI, I usually put them under the main section or give them their own page.
This week’s latest AI ethics/legal/security news: https://ethanbholland.com/2024/05/17/ethics-legal-security-ai-news-week-ending-05-17-2024/
Google AI News of the Week: Individual company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad
This week’s latest Google AI news: https://ethanbholland.com/2024/05/17/google-ai-news-week-ending-05-17-2024/
Imagery News of the Week: AI imagery covers “generative AI” image tools. This usually text-to-image, where a user enters a prompt (“a polar bear walking through NYC”) and a tool like Dalle or MidJourney generates an image in the likeness of the description. This is different than AI vision, where an AI “looks at” an image and can derive context, details, and contents. AI vision is a subset of AI called multimodality. Imagery, in this case, is for image creation and modification/editing. Adobe Photoshop’s AI tools would fall into this category. I’ll also include things like automatic masking and object removal, even though that’s in between imagery and vision… but practically speaking it fits into editing.
This week’s latest AI image news: https://ethanbholland.com/2024/05/17/imagery-news-week-ending-05-17-2024/
International AI News of the Week: A lot of international news will get cross listed in the chips, security, or open-source categories, however it’s nice to have a separate category for worldwide AI news.
This week’s latest international AI news: https://ethanbholland.com/2024/05/17/international-ai-news-week-ending-05-17-2024/
Locally Run AI Models News of the Week: This is a niche mostly for serious AI followers. It refers to AI that can be privately downloaded and run on a device without an internet connection. These have an array of powerful implications, from ethics of rogue users with untethered agents, to practical uses like Apple running a full AI on your phone, to corporate installations for security, to embodied robots with AI running in their virtual brain.
This week’s latest locally run AI news: https://ethanbholland.com/2024/05/17/locally-run-ai-models-news-week-ending-05-17-2024/
Meta AI News of the Week: This is a space dedicated for Meta specific AI advancements and news stories.
This weeks Meta AI news: https://ethanbholland.com/2024/05/17/meta-ai-news-week-ending-05-17-2024/
Microsoft AI News of the Week: This is a space dedicated for Microsoft specific AI advancements and news stories.
This weeks Microsoft AI news: https://ethanbholland.com/2024/05/17/microsoft-ai-news-week-ending-05-17-2024/
Mobile AI News of the Week: In April, 2024 I added a dedicated category for mobile. Prior, I put all most the mobile news into either the company (Apple v. Google v. Microsoft) or locally run AI. It also ended up in the chips and hardware section, or the consumer products category. There is enough mobile news to at least start cross linking it all in one place. This week’s latest mobile AI news: https://ethanbholland.com/2024/05/17/mobile-news-week-ending-05-17-2024/
Multimodal AI News of the Week: This is a broad topic for an single AI model that demonstrates an ability to interact with more than one modality (imagery, video, audio, text). Often multimodal news will end up in one of these categories. I’m playing it by ear on a case by case basis. Please be patient with my organizational challenges.
This week’s multimodal AI news: https://ethanbholland.com/2024/05/17/multimodality-news-week-ending-05-17-2024/
OpenAI: OpenAI is the leading force in the AI boom of 2023 and now 2024. This section focuses on news that is specific to OpenAI. This section will compete with all of the other sections (imagery, vision, ethics, etc) because OpenAI is so broad. I won’t be able to consistently pick when to put things under OpenAI or other sections, so bear with me.
This week’s latest OpenAI news: https://ethanbholland.com/2024/05/17/openai-news-week-ending-05-17-2024/
Open Source Models: An open source AI model refers to a class of artificial intelligence models with public source code. They can be inspected, copied, installed, and customized on private computers. In contrast, a closed source model is proprietary and owned by a company that you pay to use (like PowerPoint or Photoshop). One of the most famous open source language models is a French model called Mistral. Its code is completely publicly available, and anyone can download it and customize it. On one hand, open source is a transparent and powerful way to democratize AI, but on the other hand, open source models circumvent the guard rails and copyright protections that private companies implement. Open source models are the wild west of artificial intelligence, but also the potential saving grace (depending on who you ask). It’s a bit like gun control debates but for computing power.
This week’s latest open source news: https://ethanbholland.com/2024/05/17/open-source-ai-news-week-ending-05-17-2024/
Podcast/YouTube Clips of the Week: This is for more general interviews and explainer videos and podcasts that provide access to leadership, demos of new products, and walkthroughs and tutorials. Videos focused on specific topics will live in the topic category (i.e. images), but broader videos will live here.
This week’s latest podcasts and YouTube clips: https://ethanbholland.com/2024/05/17/podcasts-youtube-op-eds-week-ending-05-17-2024/
Publishing AI News of the Week: These are stories about AI’s impact on the publishing industry. From copyright and crawling to the death of page views or even the end of browsers.
This week’s latest publishing AI news: https://ethanbholland.com/2024/05/17/publishing-news-week-ending-05-17-2024/
RAG Retrieval-Augmented Generation News of the Week: RAG allows a language model to “reference an authoritative knowledge base outside of its training data sources before generating a response” (via Amazon). Historically RAG was prone to hallucinations, however new methods are improving the reliability. There is enough news about RAG, that I want to start tracking it separately for my own use.
This week’s latest RAG (Retrieval-Augmented Generation) AI news: https://ethanbholland.com/2024/05/17/retrieval-augmented-generation-rag-news-week-ending-04-19-2024-5/
Robotics/Embodiment News of the Week: This is the most intense area of AI. Embodiment refers to putting an AI inside of a machine. It’s “embodying” the object and therefore giving a robot agency in the real world. An example would be using a large language model as an interface to a complex coding task. Just as you ask “Alexa, play Bad Blood by Taylor Swift on Spotify” using plain language, with embodiment you could ask a robot to “Go to the laundry basket and bring me all of the red shirts”. The language model in the robot would translate your request into the proper code to go get the red shirts. The robot was never trained on the task. Another type of embodiment would be training a robot using virtual reality simulations. Using an simulation, a robot could be trained on thousands of scenarios until the real world can be swapped out and the robot doesn’t “notice”. This section also includes factory automation and human prosthetics. There will be some overlap with other categories like autonomous vehicles. I first learned about embodiment from Alan Thompson. I highly recommend his video explainer: https://youtu.be/peLqYP9BAUg?si=2FzrvDlw-qaQFaCx.
This week’s latest robot and embodiment AI news: https://ethanbholland.com/2024/05/17/robotics-and-embodiment-news-week-ending-05-17-2024/
Science/Medicine AI News of the Week: AI’s strength is learning patterns. This applies nicely to medical diagnosis and identifying trends. When combined with data and AI vision, this means AI is good at looking at x-rays. Language models are helping with patient interface, and robotics and augmented reality are advancing surgery. Powerful enterprise models like Google’s Alphafold can master protein folding. Other models can read ancient scrolls without opening them.
This week’s latest AI science and medicine news: https://ethanbholland.com/2024/05/17/science-and-medicine-news-week-ending-05-17-2024/
AI Video News of the Week: AI video in this case refers to generative video. Much like imagery meant generative imagery. This usually text-to-video, where a user enters a prompt (“a wizard walking out of a flaming building”) and a tool like Pika or Runway generates an video in the likeness of the description. It also covers animation of still images, where an image is given motion (like a photo of a waterfall appearing to have flowing water). As with images, this is different than AI vision, where an AI “looks at” an image or video and can derive context, details, and contents. Video, in this case, is video creation and modification/editing.
This week’s latest AI video news: https://ethanbholland.com/2024/05/17/video-news-week-ending-05-17-2024/
X/Twitter/Grok: Grok is one of several AI’s developed by X, and it’s a bit blended in with Telsa and other Elon Musk technology. Not every week will have a Grok section, but like Meta, Google, Apple, and OpenAI, X will be in the news enough to have its own section.
This week’s latest X news: https://ethanbholland.com/2024/05/17/twitter-x-grok-week-ending-05-17-2024/
Technical and AI Developer News of the Week: Everything that is too technical for general consumption goes here. These are stories I think are important, but might be inaccessible and confusing. It’s also a space for developer news and deep dives into how AI works, under the hood.
This week’s technical and dev AI news: https://ethanbholland.com/2024/05/17/tech-and-development-week-ending-05-17-2024/
Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive/
For previous issues, please visit the archives!

Thanks for reading!





Leave a Reply