About This Week’s Covers

Last year, Apple released an AI model called Ferret that could operate a phone.  This week, it appears Apple is putting it into motion under the Siri brand.  The prompt for the cover image is “A ferret sleeping peacefully next to an iPhone, in a living room on a cozy fall afternoon.”  I ran the prompt through Flux, Ideogram, and MidJourney, and MidJourney won by a nose.  Here are the competing entries from Flux and Ideogram:

Flux:

Ideogram:

This Week’s Executive Summaries

On October 25, 2024, I wrote an essay called “Apple is pulling a Braveheart and can change the way we use phones whenever they choose”.  This was prompted by criticism that Apple is behind in AI – I disagree.  

Over a year ago in October 2023, Apple released an open model called Ferret (paper here) that could identify and engage with objects in a user-interface  – multimodal understanding of a smartphone.  When we restrict our definition of AI to “language bots,” we’re forgetting images, video, vision, and hearing.  In those instances of multimodal AI, Apple is further along than conventional wisdom understands.  

This week Apple released four documents without fanfare, outlining developer integration with Siri.  I’m sticking with my prediction that web browsers and apps are going to fade away in the next few years.

Apple Boosts Siri and AI Features for Apps with New Developer Tools
Apple has quietly shared new tools to help developers make their apps work better with Siri and Apple Intelligence, its AI system. 

The updates focus on four key areas: First, Siri and Apple Intelligence can now understand and interact with what’s on your screen—like summarizing a webpage or answering questions about content. Second, developers can enable Siri to take actions within their apps, like creating reminders or pulling up specific app features. Third, developers can make their app’s main functions available to Siri so users can interact with them more easily. Lastly, by following Apple’s app schemas—essentially templates for actions—apps can ensure Siri understands their features and provides more natural, conversational assistance.  

Here are the four documents themselves:

“Making onscreen content available to Siri and Apple Intelligence” 
Enable Siri and Apple Intelligence to respond to a person’s questions and action requests for your app’s onscreen content.
https://developer.apple.com/documentation/appintents/making-onscreen-content-available-to-siri-and-apple-intelligence

“Integrating actions with Siri and Apple Intelligence”
Create app intents, entities, and enumerations that conform to assistant schemas to tap into the enhanced action capabilities of Siri and Apple Intelligence.
https://developer.apple.com/documentation/appintents/integrating-actions-with-siri-and-apple-intelligence

“Making your app’s functionality available to Siri”
Add assistant schemas to your app so Siri can complete requests, and integrate your app with Apple Intelligence, Spotlight, and other system experiences.
https://developer.apple.com/documentation/appintents/making-your-app-s-functionality-available-to-siri

“App intent domains”
Make your app’s actions and content available to Siri and Apple Intelligence with assistant schemas.
https://developer.apple.com/documentation/appintents/app-intent-domains

Speaking of AI agents…

Microsoft’s New Agent System Can Solve Complex Tasks – But Has The Worst Name 
Microsoft Research has introduced Magentic-One, a new AI system that uses teams of agents— little digital workers—to solve complicated tasks. Led by a central “Orchestrator,” these agents work together to handle jobs like coding, browsing the web, or managing files. Unlike chatbots, Magentic-One can tackle multi-step problems, such as running software, shopping, or analyzing data. Microsoft has made the system open-source through AutoGen, so developers can build and experiment with it.  
Microsoft

Google Accidentally Leaks Jarvis AI, a Powerful Computer Assistant
Google briefly leaked a preview of its new AI prototype, Jarvis, through the Chrome Web Store. Designed to handle tasks like online shopping, flight booking, and research (sound similar to the two above this one?), Jarvis can reportedly take control of a computer to perform actions without manual input. However, users couldn’t fully access the tool due to permission restrictions before Google quickly removed the extension. The accidental leak comes ahead of Jarvis’ official launch in December and signals Google’s competition with rivals like Anthropic’s Claude AI, which already enables advanced computer control.
Engadget 

The coming robotics revolution will change the world.

NVIDIA Launches Tools to Accelerate Robot Learning and Humanoid Development
If you asked me to name only one topic to follow, I’d quickly answer that the NVIDIA robotics program is the coolest thing in AI. This week NVIDIA introduced a suite of AI tools to advance the development of intelligent robots, including humanoids. Key announcements include the NVIDIA Isaac Lab, an open-source framework for training robots using advanced simulations.  Isaac Lab allows developers to train robots—like humanoids, quadrupeds, or collaborative robots—at scale in simulations for increasingly complex tasks.  The simulations provide six workflows that act as blueprints for key humanoid capabilities: generative environments, motion and trajectory generation, dexterous manipulation, whole-body control, mobility, and multimodal sensing.  NVIDIA also introduced tools for efficient video data processing, crucial for training AI robots. The Cosmos tokenizer compresses images and video up to 12 times faster than existing tools, while NeMo Curator optimizes video data curation pipelines, improving speeds up to 7 times. 
Nvidia

A Few Quick Headlines and Stats to Know That Don’t Need a Summary

“Sam Altman says AGI is coming in 2025”

https://twitter.com/tsarnick/status/1854988648745517297

OpenAI defeats news outlets’ copyright lawsuit over AI training, for now | Reuters

https://www.reuters.com/legal/litigation/openai-defeats-news-outlets-copyright-lawsuit-over-ai-training-now-2024-11-07

“The latest @Waymo California driverless ride stats are out from @californiapuc. In August 2023: 12,000 for the month. In August 2024: 312,000 for the month. 

https://twitter.com/NatBullard/status/1852721528523375032

Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen” | Tom’s Hardware

https://www.tomshardware.com/tech-industry/artificial-intelligence/meta-is-using-more-than-100-000-nvidia-h100-ai-gpus-to-train-llama-4-mark-zuckerberg-says-that-llama-4-is-being-trained-on-a-cluster-bigger-than-anything-that-ive-seen

AI Visuals and Charts: Week Ending 11/08/2024

These links are worth seeing to be sure you’re up to speed thin

“If only your browser could see what I see… oh wait, Copilot Vision will be able to very soon  

https://twitter.com/MSFTCopilot/status/1852436466640503228

Deep Imagination Research | NVIDIA
Cosmos Tokenizer: A suite of image and video neural tokenizers (upscalers)

https://research.nvidia.com/labs/dir/cosmos-tokenizer

Learning in 2 Days, Human-Like Natural Walking – YouTube

“AdvancedLivePortrait-WebUI Dedicated gradio based WebUI for ComfyUI-AdvancedLivePortrait edit the facial expression from the image 

X-Portrait 2: Highly Expressive Portrait Animation

“It’s wild just how expressive X-Portrait 2 is! What you’re seeing is the video on the bottom right “driving” the performance of the still image on the bottom left. Unlike Runway’s Act-One it can transfer fast head movements, minuscule expression changes and stronger emotions. 

Top 66 Links of The Week – Organized by Category 

Agents and Copilots: AI News Week Ending 11/08/2024

Octoverse: AI leads Python to top language as the number of global developers surges – The GitHub Blog

Microsoft’s Copilot AI is coming to your Office apps – whether you like it or not | ZDNET

AGI (Artificial General Intelligence): AI News Week Ending 11/08/2024

Devious humour and painful puns: will the cryptic crossword remain the last thing AI can’t conquer? | Crosswords | The Guardian

“Paper shows that AI (in this case a diffusion model) accelerates innovation. Among key findings: 1) GenAI increases novel discoveries: a 39% increase in patent filings! 2) It boosts the best performers by acting as a co-intelligence 3) It takes away some of the fun parts of work 

Amazon: AI News Week Ending 11/08/2024

Prime Video’s X-Ray Recaps Feature Keeps You Up to Speed on Your Favorite Shows

Amazon will now use AI to recap what you’re watching

Anthropic: AI News Week Ending 11/08/2024

“The new Claude 3.5 Haiku is here. ⚡️ In my experience, this is one of the most fun models to use. And with that speed and cost, I can see people building a LOT of cool stuff with it. Also, it beats the old 3.5 Sonnet and Opus. How amazing is that?! 

Claude 3.5 Haiku \ Anthropic

“Haiku 3.5 is 4x more expensive than Haiku 3.0 Strange move by Anthropic. They claim it’s because Haiku is a good model. It doesn’t make much sense… With all fast-moving tech, we should see both costs go down by 10x while performance increases by 10x The correct answer was” / X

“Anthropic’s Claude 3.5 Haiku release is a significant jump in intelligence from Claude 3 Haiku but its higher price makes it a tricky choice for developers Claude 3.5 Haiku now achieves an Artificial Analysis Quality Index of 69, substantially above Claude 3 Haiku’s 54 and just 

Anthropic hikes the price of its Haiku model | TechCrunch

“Claude 3 Haiku remains available for use cases that benefit from image input or its lower price point. https://t.co/7eGCsTxBT4” / X – https://x.com/AnthropicAI/status/1853498272863691125

“Perplexity now supports @AnthropicAI’s Claude 3.5 Haiku (released yesterday) as a replacement for Claude 3 Opus. Retiring Claude 3 Opus keeps Perplexity up-to-date on the latest models from Anthropic, so that you receive the best experience possible. 

Apple: AI News Week Ending 11/08/2024

iOS 18.2 Beta 2 Shows Siri ChatGPT Limit, Offers ‘Plus’ Upgrade Option – MacRumors

Augmented and Virtual Reality (AR/VR): AI News Week Ending 11/08/2024

“DimensionX Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion 

“Visualizing drone flight paths in mapbox — a well designed map is pure visual umami 😍 

New role at Netflix: VP, GenAI for Games

“Introducing Oasis: the first playable AI-generated game. We partnered with @DecartAI to build a real-time, interactive world model that runs >10x faster on Sohu. We’re open-sourcing the model architecture, weights, and research. Here’s how it works (and a demo you can play!): 

Education: AI News Week Ending 11/08/2024

FrontierMath | Epoch AI
A math benchmark testing the limits of AI

Ethics/Legal/Security: AI News Week Ending 11/08/2024

Order on Motion to Dismiss – #117 in Raw Story Media, Inc. v. OpenAI Inc. (S.D.N.Y., 1:24-cv-01514) – CourtListener.com

Google’s ‘Big Sleep’ AI Project Uncovers Real Software Vulnerabilities | PCMag

Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code

Fashion models step aside as AI glam bots join them in ads: ‘It’s about faster content creation’

Interactive AI Interviews are going to become a new norm, expert says

The Beatles Now And Then First AI-Assisted Song With Grammy Nomination

Anthropic and Palantir Partner to Bring Claude AI Models to AWS for U.S. Government Intelligence and Defense Operations | Business Wire

Palantir IR – News

Google: AI News Week Ending 11/08/2024

“A nice example of simulated “theory of mind” I showed Gemini 1.5 Apple’s “hydraulic press” iPad ad when it came out: “The ad could be seen as sending a mixed message or even a negative one, suggesting that the new iPad might lead to the destruction of other valuable things” Yep 

International: AI News Week Ending 11/08/2024

“it is critically important that the US maintains its lead in developing AI with democratic values.” / X

Exclusive: US ordered TSMC to halt shipments to China of chips used in AI applications | Reuters

Mystery Surrounds Discovery of TSMC Tech Inside Huawei AI Chips – WSJ

Meta: AI News Week Ending 11/08/2024

“Ollama 0.4 is released with support for Meta’s Llama 3.2 Vision (11B and 90B) models. @AIatMeta See what it can do: 👇👇👇 

“@AIatMeta Examples of running Llama 3.2 vision: Reading handwriting 

“Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 

Meta is making a robot hand that can ‘feel’ touch | TechCrunch

“We’re at #CoRL2024 this week in Munich presenting our latest robotics research. On the ground in in Munich? We’re demoing our work on Meta Sparsh & Meta Digit 360 + more at our booth. Following from your feed? Read more about this work ➡️ 

Multimodality: AI News Week Ending 11/08/2024

X-Portrait 2: Highly Expressive Portrait Animation

“It’s wild just how expressive X-Portrait 2 is! What you’re seeing is the video on the bottom right “driving” the performance of the still image on the bottom left. Unlike Runway’s Act-One it can transfer fast head movements, minuscule expression changes and stronger emotions. 

“AdvancedLivePortrait-WebUI Dedicated gradio based WebUI for ComfyUI-AdvancedLivePortrait edit the facial expression from the image 

OpenAI: AI News Week Ending 11/08/2024

T-Mobile Agreed to Pay OpenAI $100 Million Over Three Years for AI — The Information

ChatGPT Can Now Control a Robot Arm

“After working at OpenAI for almost 7 years, I decide to leave. I learned so much and now I’m ready for a reset and something new. Here is the note I just shared with the team. 🩵 

How To Build The Future: Sam Altman – YouTube

OpenAI reportedly developing new strategies to deal with AI improvement slowdown | TechCrunch

OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows — The Information

Order on Motion to Dismiss – #117 in Raw Story Media, Inc. v. OpenAI Inc. (S.D.N.Y., 1:24-cv-01514) – CourtListener.com

Open Source: AI News Week Ending 11/08/2024

“Since our first version of Hermes was released over a year ago, many people have asked for a place to experience it. Today we’re happy to announce Nous Chat, a new user interface to experience Hermes 3 70B and beyond. 

The Gap Between Open and Closed AI Models Might Be Shrinking | TIME

Podcasts/YouTube/Op-Eds: AI News Week Ending 11/08/2024

Open Source AI Can Help America Lead in AI and Strengthen Global Security | Meta – https://about.fb.com/news/2024/11/open-source-ai-america-global-security/

“🎙️ Launching AI Talk with a fascinating first guest – Junyang Lin (@JustinLin610), the mind behind Qwen. Ever wondered how Chinese AI labs operate? From building one of the strongest open-source LLMs to China’s unique AI ecosystem Plus, Junyang hints at what’s next👀 #AI #ML 

How To Build The Future: Sam Altman – YouTube

Publishing: AI News Week Ending 11/08/2024

OpenAI defeats news outlets’ copyright lawsuit over AI training, for now | Reuters

Mistral launches a moderation API | TechCrunch

As generative AI gets better, what will happen to artists? | TechCrunch

Interactive AI Interviews are going to become a new norm, expert says

The Beatles Now And Then First AI-Assisted Song With Grammy Nomination

“We @MistralAI just released the moderation API and the batch API. Details in 🧵: 

Mistral Moderation API | Mistral AI | Frontier AI in your hands

The Washington Post Launches “Ask The Post AI,” a New Search Experience – The Washington Post

How The New York Times is using generative AI as a reporting tool – Ars Technica

LLMs help reporters transcribe and sort through hundreds of hours of leaked audio.

Robotics and Embodiment: AI News Week Ending 11/08/2024

Physical Intelligence, a Specialist in Robot A.I., Raises $400 Million – The New York Times

Learning Visual Parkour from Generated Images

NVIDIA Advances Robot Learning, Humanoid Development With New AI and Simulation Tools | NVIDIA Blog

“I’m delighted to share that I’m joining @OpenAI to lead robotics and consumer hardware! In my new role, I will initially focus on OpenAI’s robotics work and partnerships to help bring AI into the physical world and unlock its benefits for humanity. 

Video News: AI News Week Ending 11/08/2024

“AdvancedLivePortrait-WebUI Dedicated gradio based WebUI for ComfyUI-AdvancedLivePortrait edit the facial expression from the image https://x.com/_akhaliq/status/1854222095477318096

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading