This week’s cover features a man in a courtroom, wearing VR goggles.  He’s watching “Virtual Insanity” by Jamiroquai. The video still depicts the Rabbit AI handheld assistant, as if lead singer Jay Kay were showcasing the device.  This week was full of “insane” AR/VR news, yet also the New York Times lawsuit against OpenAI, and more buzz around MidJourney and Magnific.  The image was created with MidJourney, upscaled with Magnific, and spruced up with Photoshop.  The text is Times New Roman.

Executive Summary

Here are the top stories to know, at a high level. I’m showcasing the ones I feel are most important and underreported. 

  • Rabbit: Rabbit is a cool handheld AI assistant that uses voice commands and a scroll-wheel to be a replacement for the entire Apple/Google app store.  Instead of traditional apps, you interact with the device using voice commands, similar to how you would use Alexa, to access and connect various services.  It has a web-interface to train it on tasks and learn web-browsing.  So if you want to teach it to make images, it will learn the process, and you can “ask it” to make images using conversational comments. While it may not dominate the market, the demo itself is an educational showcase of future technological trends. Watching the Rabbit launch keynote is my top recommendation for things to know this week: https://youtu.be/22wlLy7hKP4?si=uHtle4Anvi8jUE9x 
  • Google Generative 3D Object Insertion: Google’s new text-to-video object insertion tool allows you to describe what you want to add to a video, and it will generate it.  For example, “Create a cake on the tray”… will make you one, within a moving 3D scene”
  • Pika AI Video Advances: “Sometimes your scene could use a little extra space—or an extra horse. Expand Canvas is here for you. And when that horse isn’t quite magically cute enough, adjust the style with Video-to-Video.”
  • AI is World’s Biggest Threat: Davos says AI-powered misinformation is the world’s biggest short-term threat
  • OpenAI replies to the NY Times Lawsuit: OpenAI has responded to the New York Times lawsuit, stating that the Times is not telling the full story.  OpenAI also argues that it’s ‘impossible’ to build tools without using copyrighted material.  What is interesting to me is that both sides are managing the message extremely aggressively, risking going ‘out over their skis’ to make rhetorical points to drive precedent at least as much as to make their actual case.  It’s a powerful chess match.
  • AI Beats Doctors at Telemedicine: In a controlled test, initial results show AI outperforming humans at diagnostic decisions.
  • OpenAI Launches A “Build Your Own Bot” Store: “Introducing the GPT Store: Over 3M GPTs have been created and now you can find the most useful versions of ChatGPT for you.”
  • L’Oréal Has a LOT of Data: “We already own 10 petabytes of data on our L’Oréal data platform, supporting all types of AI models, including the latest LLMs”.  We often think of consumer brands as users of AI, but their existing ‘big data’ efforts will easily fit into AI strategies in surprising ways.
  • Open AI’s Vision Capability Is Very Good: Examples are “Find me the snow leopard.”  And solving the popular “Muffins or dogs?” CAPTCHA.
  • Video Game Characters Are Getting Better At Chatting: “I spoke to an Nvidia AI-powered NPC about his ramen and his responses were frighteningly good” This video (link below) is powerful.  Dynamic player interaction will transform video games and of course make us all wonder if we’re already in a simulation.

Top Thirteen Stories 

These are the must-click links if you only have time for a few.  Even if they look boring, click them!  I did the work, so you don’t have to worry.  All are 10/10 would recommend. 

The Rest: AI News of The Week

I’m breaking out the rest of the news into separate posts. Don’t let the volume overwhelm you.  Have fun and skim them. The links on each page are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline.  I’ve added a description and glossary of what the topics mean, beneath each label, in plain language.  I do the work so you don’t have to!   When you visit the pages, note that the links and descriptions are often pulled directly from tweets or articles, so it’s not always my voice.  Pause when you see something that interests you.  Reach out to me any time. I enjoy sharing and discussing these items.

  • Agency/Agents/Copilots News: Agency is when AI can do things for you (like Googling an actress name or fetching the latest weather forecast). An agent is one step further, when AI given autonomy to take action on your behalf (“Alexa, book a reservation for three at Peak in Hudson Yards for Friday night”). A co-pilot is an assistant (like spell check or autofill).
  • AI Audio News: AI audio is usually separated into a few buckets. Generative audio is when you give a description of what you want, and the AI creates it. Like ChatGPT but a musician. Cloning is similar, but it’s usually for voices or instruments. Selective listening is when a microphone or headset can filter what is picked up or played out based on voices or tones.
  • Augmented and Virtual Reality News (AR/VR): Augmented reality is when you see images or information on top of the real world.  A car windshield with a heads-up display of the speed. Or glasses that have facial recognition and overlay the names of everyone in view. Virtual reality is when you are transported into another place, usually wearing goggles, but a flight simulator could also be considered virtual reality.
  • Autonomous Vehicles: aka “driverless cars”
  • Business/Enterprise: This broad category is for stories that impact corporations and large scale AI implementation. Enterprise refers to a type of AI that is often custom built for a business or leverage an API to connect secure data to an AI model. 
  • Consumer Electronics: This is a broad category but this week was CES in Vegas, so there was a lot of news in the consumer product space.
  • Education: There is a lot of buzz around the impact of AI in education. This section focuses both on the risks and rewards of how AI can impact learning. It’s broader than just K-12 and includes things like skills, trade, professional, and higher education. This is not about how to learn AI, it’s about AI’s impact on learning.
  • Ethics/Legal/Security: This section focuses on the impact AI is having on ethics (deep fakes, war, trust, false information, plagiarism, job loss, income), legal (rights, laws, regulations), and security (hacking, phishing, national interests, safety). For huge news stories like the NY Times suing OpenAI, I usually put them under the main section or give them their own page.
  • Grok/X/Twitter: Grok is one of several AI’s developed by X, and it’s a bit blended in with Telsa and other Elon Musk technology. Not every week will have a Grok section, but like OpenAI, Apple, and Google, X will be in the news enough to have its own section.
  • Imagery: AI imagery covers “generative AI” image tools. This usually text-to-image, where a user enters a prompt (“a polar bear walking through NYC”) and a tool like Dalle or MidJourney generates an image in the likeness of the description. This is different than AI vision, where an AI “looks at” an image and can derive context, details, and contents. AI vision is a subset of AI called multimodality. Imagery, in this case, is for image creation and modification/editing. Adobe Photoshop’s AI tools would fall into this category. I’ll also include things like automatic masking and object removal, even though that’s in between imagery and vision… but practically speaking it fits into editing.
  • OpenAI: OpenAI is the leading force in the AI boom of 2023 and now 2024. This section focuses on news that is specific to OpenAI. This section will compete with all of the other sections (imagery, vision, ethics, etc) because OpenAI is so broad. I won’t be able to consistently pick when to put things under OpenAI or other sections, so bear with me.
  • Open Source Models: An open source AI model refers to a class of artificial intelligence models with public source code. They can be inspected, copied, installed, and customized on private computers. In contrast, a closed source model is proprietary and owned by a company that you pay to use (like PowerPoint or Photoshop). One of the most famous open source language models is a French model called Mistral. Its code is completely publicly available, and anyone can download it and customize it. On one hand, open source is a transparent and powerful way to democratize AI, but on the other hand, open source models circumvent the guard rails and copyright protections that private companies implement. Open source models are the wild west of artificial intelligence, but also the potential saving grace (depending on who you ask). It’s a bit like gun control debates but for computing power.
  • Robotics/Embodiment: This is the most intense area of AI. Embodiment refers to putting an AI inside of a machine. It’s “embodying” the object and therefore giving a robot agency in the real world. An example would be using a large language model as an interface to a complex coding task. Just as you ask “Alexa, play Bad Blood by Taylor Swift on Spotify” using plain language, with embodiment you could ask a robot to “Go to the laundry basket and bring me all of the red shirts”. The language model in the robot would translate your request into the proper code to go get the red shirts. The robot was never trained on the task. Another type of embodiment would be training a robot using virtual reality simulations. Using an simulation, a robot could be trained on thousands of scenarios until the real world can be swapped out and the robot doesn’t “notice”. This section also includes factory automation and human prosthetics. There will be some overlap with other categories. I first learned about embodiment from Alan Thompson. I highly recommend his video explainer: https://youtu.be/peLqYP9BAUg?si=2FzrvDlw-qaQFaCx.
  • Science/Medicine: AI’s strength is learning patterns. This applies nicely to medical diagnosis and identifying trends. When combined with data and AI vision, this means AI is good at looking at x-rays. Language models are helping with patient interface, and robotics and augmented reality are advancing surgery. Powerful enterprise models like Google’s Alphafold can master protein folding. Other models can read ancient scrolls without opening them.
  • Video: AI video in this case refers to generative video.  Much like imagery meant generative imagery. This usually text-to-video, where a user enters a prompt (“a wizard walking out of a flaming building”) and a tool like Pika or Runway generates an video in the likeness of the description. It also covers animation of still images, where an image is given motion (like a photo of a waterfall appearing to have flowing water).
    As with images, this is different than AI vision, where an AI “looks at” an image or video and can derive context, details, and contents. Video, in this case, is video creation and modification/editing.
  • Vision: AI is getting better at “seeing” what is in an image. Reading signs, understanding what objects are present, knowing the context and emotions, and predicting what might happen next. This is part of a broader topic called “multimodality”. It’s one of the coolest and most important areas of AI development.
  • Chips and Hardware: Stories and news about the machines, chips, and hardware that powers AI.
  • Technical and Developer Stories: Everything that is too technical for general consumption goes here. These are stories I think are important, but might be inaccessible and confusing. It’s also a space for developer news and deep dives into how AI works, under the hood.

Credits/Sources

Most of these links come from just a few incredible sources.  Please follow them:

Previous Issues

51 responses to “AI News #15: Week Ending 01/12/2024 with Executive Summary and Top 13 Stories”

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading