The main cover image prompt was A floral poster in the style of Kehinde Wiley with a post-apocalyptic theme. Text added with Photoshop.

AI News #53: Week Ending 10/04/2024 with Executive Summary, Top 42 Links, and Helpful Visuals

October 5, 2024

About This Week’s Cover

This week’s cover theme was meant to keep it quick and easy. I asked Ideogram to make covers in the style of one of my favorite artists, Kehinde Wiley. The prompt was A floral poster in the style of Kehinde Wiley with a [category name] theme and the word “[category” in bold text. The main cover image prompt was A floral poster in the style of Kehinde Wiley with a post-apocalyptic theme. Text added with Photoshop. The categories turned out incredibly. Here are a few of my favorites:

This Week’s Executive Summaries

Nvidia Launches Surprise Open-Source AI Model to Compete with GPT-4
Nvidia has just released a very impressive open-source AI model called NVLM 1.0, designed to tackle both text and visual tasks, on par with proprietary AI like GPT-4. At the core of this new family is the NVLM-D-72B model, with 72 billion parameters, making it one of the most powerful open AI models. Nvidia’s move to openly share the model’s code and training details breaks industry norms, as most tech giants keep their advancements private. This means researchers and smaller organizations can now access tools that were previously limited to big tech, potentially accelerating AI progress. NVLM-D-72B is multimodal, strong at interpreting images and text, including analyzing memes and solving math problems step-by-step. Unusually, the model gets better at handling text-only tasks after visual training, a rare boost that sets it apart. By making the advanced model accessible, Nvidia could drive a wave of innovation and collaboration putting even more heat on OpenAI to keep moving quickly, as it loses its lead to open source every few months.
Venturebeat | huggingface (try it)

The craziest moment I’ve seen in six months
This actually freaks me out how well the AI handled this. I’d say it’s better than 99% of people could achieve. “Someone on Reddit asked NotebookLM to analyze a text containing only “poop” and “fart” written 1,000 times. The result is… incredible.”
https://x.com/fdaudens/status/1841305953116758220

Meta’s Movie Gen: Video Editing With Text Prompts
Meta’s new Movie Gen tool can create and edit high-quality videos with just text prompts, setting a new bar for personalized AI-driven content. Users can type in simple descriptions to produce custom videos, edit scenes, add styles, and even generate soundtracks—all without needing advanced editing skills. Movie Gen can create HD videos in different formats and also personalize content; for instance, uploading a selfie and adding text instructions will generate videos that reflect the user’s appearance and movements.
Meta | ShellySheynin | AIatMeta

“Meta unveiled Movie Gen, which they claim is the “most advanced media foundation model to-date” It can generate high-quality AI videos from text, but what’s really cool is the ability for it to do precise video editing. Game changer for Hollywood:
https://twitter.com/adcock_brett/status/1842958865198981619

California Governor Vetoes AI Safety Bill, Sparking Debate Over Innovation vs. Regulation
California Governor Gavin Newsom recently vetoed a major AI safety bill, SB 1047, which would have forced large AI companies to adopt strict safety protocols. Newsom argued that the bill went too far by applying broad rules to all large systems, potentially stalling innovation and placing an unfair burden on AI companies. He raised concerns that the bill might create a false sense of security without truly addressing fast-evolving AI risks. Supporters of the bill, including Senator Scott Wiener and Hollywood figures like Elon Musk and Mark Hamill, argue that binding safeguards are crucial as AI’s impact grows. Important callout: Musk likely sees strict AI regulations as a way to slow competitors like Google and Meta, potentially giving his companies an advantage. The bill’s opponents—including tech giants like Google and Meta— predictably warned the bill could stifle innovation, harm business growth, and disrupt California’s role as a tech leader.
Politico | theverge | techcrunch

OpenAI Launches “Canvas” for Enhanced Collaboration with ChatGPT (yes, collaboration with the AI)
OpenAI has introduced “Canvas,” a new workspace in ChatGPT designed to make working on writing and coding projects smoother and more interactive. Instead of just chatting, users can now collaborate directly with ChatGPT in a shared editing environment that allows for inline suggestions, edits, and adjustments. Canvas helps ChatGPT understand what users are trying to achieve by letting them highlight specific sections and request targeted feedback. This setup gives users control to refine writing, adjust readability, debug code, and even translate code into other languages. With shortcuts for tasks like adding comments or changing the reading level, Canvas makes ChatGPT feel more like a real-time editor or coding assistant.

Examples of OpenAI Canvas in action:

Do research in canvas! GPT4o with canvas can do research about art history, write a report, you can ask to verify its claims and then add citations and bibliography:
https://twitter.com/karinanguyen_/status/1841889811931791642

Search for best restaurants and invite a friend to one of them in a written email
https://twitter.com/karinanguyen_/status/1841889814230061480

GPT4o with canvas writes code in rust and then reviews code
https://twitter.com/karinanguyen_/status/1841889815689637979

Canvas model can browse for recipes and invent a new one based on the ingredients you have! https://x.com/karinanguyen_/status/1841889817522520246

ChatGPT’s new canvas interface is a game changer. Just used it to create a tesseract/hypercube visualizer with ThreeJS. Loving the unified UX — chat, inline comments, and watching GPT-4o work its magic on the code — all in one place.
https://twitter.com/bilawalsidhu/status/1841906953083068452

Vertical AI Agents: Disruption Across Industries – Potential Job Losses
AI agents focused on specific industries, known as “vertical AI agents,” are quickly transforming the landscape of software and services. Industry experts predict these tailored AI systems will drive the next wave of billion-dollar SaaS companies, reshaping fields from call centers to legal services. Recent analysis by firms like Felicis and CB Insights shows AI agents advancing rapidly, with over 50 companies joining the space in the last two years alone. That said, Ethan Mollick predicts general-purpose AI agents will eventually swallow up all of the niche-focused agents, making not only our jobs, but entire industries up for grabs. These agents can perform specialized tasks at a fraction of the human cost—OpenAI’s real-time pricing model, for example, offers call center capabilities at just $9 per hour, a fraction of standard labor costs.
Ycombinator | chiefaioffice | CBinsights | just_watt | jowyang

“When I see this I realize most people, even those in AI, don’t get the vision of the AI labs When they talk about their agents, they mean generalized ones. Industry-specific knowledge gets subsumed. They may fail but they are aiming for that, which would kill most of these firms”
https://twitter.com/emollick/status/1841142712159932660

“Call Center industry is so DONE. The official @OpenAI Real-time Pricing is $0.06 per minute of audio input and $0.24 per minute of audio output. So the average cost of 5 minutes call with the Real Time API is $0.75, that’s $9 / hour. And the average call center Human agent”
https://twitter.com/rohanpaul_ai/status/1841833425432449066

See it to believe it moment: browsing the web with a voice agent
“Sound On. This is what will come the ChatGPT desktop in a few months. Feel the AGI moment!
The new Realtime API with web crawling is mind-blowing! Talk in realtime with any website. Check it out:
https://twitter.com/8teAPi/status/1842271653222666543
https://twitter.com/nickscamara_/status/1842243883842904529

AI Visuals and Charts: Week Ending 10/04/2024

Demonstrations of Pika’s Image to Video

“Not what massaging the data means. (I have been randomly animating scientific diagrams with Pika)
https://twitter.com/emollick/status/1841345969184498168

“Pika 1.5 is pretty wild. When I said generative AI would let us edit reality, this is not what I had in mind… lol
https://twitter.com/bilawalsidhu/status/1841195247184781420

Lip Sync Generation from an Image

“This is Hollywood grade lip-syncing: Here is how to create an AI avatar with accurate and realistic Lip-Sync. This might be currently the best tool for video-to-video right now. Only 4 steps are needed:
https://twitter.com/HalimAlrasihi/status/1839310216602788103

Incredible Internet Voice Browsing Using an AI Agent

“Sound On. This is what will come the ChatGPT desktop in a few months. Feel the AGI moment!
The new Realtime API with web crawling is mind-blowing! Talk in realtime with any website. Check it out:
https://twitter.com/8teAPi/status/1842271653222666543
https://twitter.com/nickscamara_/status/1842243883842904529

Two Examples Of AI Competing with Human Creativity and Expression
“AI humor? Someone on Reddit asked NotebookLM to analyze a text containing only “poop” and “fart” written 1,000 times. The result is… incredible. “Is someone messing with us to see if we’ll spend an entire deep dive discussing poop and fart?”
https://twitter.com/fdaudens/status/1841305953116758220

“NotebookLM Podcast Hosts Discover They’re AI, Not Human—Spiral Into Terrifying Existential Meltdown Via Reddit
Karpathy comments: “I think I’d be more impacted if they displayed an understanding of their existence as promoted language models generating token sequences. As is, it’s more of a word salad of internet grade AI tropes, but it certainly takes it up a notch with the voice and conversation format.”
https://twitter.com/kimmonismus/status/1839975655150064124

Agents and Copilots

“It is great to see many companies finding success in automating customer support. @stochasticai we are building end-to-end customer service agents that have automated 60-70% of a F500 client’s customer support volume. DM me if you’d like to learn more.” / X

https://twitter.com/glennko/status/1842869624595198098

“2/ Search for best restaurants and invite a friend to one of them in a written email:

https://twitter.com/karinanguyen_/status/1841889814230061480

“AI voice agents are on 🔥 We’re moving from the innovator -> early adopter part of the curve, with new startups sprouting up weekly to serve different verticals. What @illscience and I are seeing @a16z, and why we’re excited 👇

https://twitter.com/omooretweets/status/1841143621434949809

“✨New Report✨ AI’s next big thing: “agents” that can independently pursue complex, real-world goals. Explore our new CSET-led workshop report for more on what agents are and what we might need to do to be ready for them.

https://twitter.com/CSETGeorgetown/status/1841841631760122333

Google

Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts – Bloomberg

https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts

Anthropic

Anthropic hires OpenAI co-founder Durk Kingma | TechCrunch

Anthropic hires OpenAI co-founder Durk Kingma

“Personal news: I’m joining @AnthropicAI! 😄 Anthropic’s approach to AI development resonates significantly with my own beliefs; looking forward to contributing to Anthropic’s mission of developing powerful AI systems responsibly. Can’t wait to work with their talented team,” / X

https://twitter.com/dpkingma/status/1841134573595312344

Augmented and Virtual Reality (AR/VR)

“Everything you do wearing AR glasses will be reconstructed in 3D in world space. EgoLM tracks and understands egocentric (aka PoV) motions from multimodal inputs, e.g. on your AR glasses

https://twitter.com/bilawalsidhu/status/1839869877835690451

“WiLoR can localize and reconstruct multiple hands in real-time from single images! It achieves smooth 3D hand tracking with high accuracy, using a large dataset of over 2 million hand images. Links ⬇️

https://twitter.com/dreamingtulpa/status/1840661047843094687

“Thrilled to welcome @_tim_brooks to @GoogleDeepMind. So excited to be working together to make the long-standing dream of a world simulator a reality!!” / X

https://twitter.com/demishassabis/status/1841984103312208037

Consumer Products: Week Ending 10/04/2024

Airlines turn to AI to allocate gates and cut waiting times

https://www.bbc.com/news/articles/c80e54yjzdmo

Ethics/Legal/Security AI

Someone Put Facial Recognition Tech onto Meta’s Smart Glasses to Instantly Dox Strangers

https://www.404media.co/someone-put-facial-recognition-tech-onto-metas-smart-glasses-to-instantly-dox-strangers

OpenAI asks investors not to back rival start-ups such as Elon Musk’s xAI

https://www.ft.com/content/66e0653e-c446-47b2-8a7f-baa54ccbfb9a

AI Can Best Google’s Bot Detection System, Swiss Researchers Find – Decrypt

https://decrypt.co/251107/ai-can-best-googles-bot-detection-system-swiss-researchers-find

Imagery

Black Forest Labs releases Flux 1.1 Pro and an API | VentureBeat

Black Forest Labs releases Flux 1.1 Pro and an API

Announcing FLUX1.1 [pro] and the BFL API – Black Forest Labs

Announcing FLUX1.1 [pro] and the BFL API

International

ByteDance will reportedly use Huawei chips to train a new AI model

https://www.engadget.com/ai/bytedance-will-reportedly-use-huawei-chips-to-train-a-new-ai-model-154846749.html

Microsoft AI

Introducing Copilot Labs and Copilot Vision | Microsoft Copilot Blog

Introducing Copilot Labs and Copilot Vision

Microsoft’s Copilot AI Gets a Voice, Vision, and a ‘Hype Man’ Persona | WIRED

https://www.wired.com/story/microsoft-copilot-vision-voice-emotional-support-windows-office

An AI companion for everyone – The Official Microsoft Blog

https://blogs.microsoft.com/blog/2024/10/01/an-ai-companion-for-everyone

Microsoft brings AI-powered overviews to Bing | TechCrunch

Microsoft brings AI-powered overviews to Bing

Multimodality

Introducing vision to the fine-tuning API | OpenAI

https://openai.com/index/introducing-vision-to-the-fine-tuning-api

“You can now use Pythagora to build full stack, production-ready apps through chat. The VScode extension uses 14 AI Agents that work together to plan, write code, review it, build test, debug and deploy. It manages the whole development process and will ask you whenever it

You can now use Pythagora to build full stack, production-ready apps through chat.

The VScode extension uses 14 AI Agents that work together to plan, write code, review it, build test, debug and deploy.

It manages the whole development process and will ask you whenever it… pic.twitter.com/vMQ2KMq3o0
— Lior⚡ (@LiorOnAI) October 2, 2024

“voice mode has function calling and is weirdly obsessed with strawberrries he is integrating with @twilio api and ordering strawberries for all ofnus! classic twilo demo

voice mode has function calling and is weirdly obsessed with strawberrries

he is integrating with @twilio api and ordering strawberries for all ofnus! classic twilo demo pic.twitter.com/yMoehMLCcX
— swyx (@swyx) October 1, 2024

“Folks do not realize the wide implications of multimodal video + huge context windows + AI. I uploaded a 2009 video of a crowded street scene and Gemini 1.5 was able to answer detailed questions about what happened in it, down to individual car brands & type with good accuracy.

Folks do not realize the wide implications of multimodal video + huge context windows + AI.

I uploaded a 2009 video of a crowded street scene and Gemini 1.5 was able to answer detailed questions about what happened in it, down to individual car brands & type with good accuracy. pic.twitter.com/hQ1s3gHX81
— Ethan Mollick (@emollick) September 30, 2024

Google’s Visual Search Can Now Answer Even More Complex Questions | WIRED

https://www.wired.com/story/google-lens-multimodal-search

OpenAI

Behind OpenAI’s Audacious Plan to Make A.I. Flow Like Electricity – The New York Times

OpenAI is Revamping Sora AI Video — The Information

https://www.theinformation.com/articles/openai-is-revamping-sora-ai-video

OpenAI Discusses Giving Sam Altman 7% Stake in For-Profit Transition – Bloomberg

https://www.bloomberg.com/news/articles/2024-09-25/openai-cto-mira-murati-says-she-will-leave-the-company?embedded-checkout=true

OpenAI Completes Deal That Values Company at $157 Billion – The New York Times

OpenAI Is Growing Fast and Burning Through Piles of Money – The New York Times

“@herbertong @thetylerhayes OpenAI closed a new $6.6B funding round, now valuing the company at $157B. This solidifies its position as the most well-funded AI startup in the world. Nice. (Chart from @chartrdaily)

OpenAI closed a new $6.6B funding round, now valuing the company at $157B.

This solidifies its position as the most well-funded AI startup in the world.

Nice.

(Chart from @chartrdaily) pic.twitter.com/Wdgj0CRGFW
— Brett Adcock (@adcock_brett) October 6, 2024

SoftBank to Invest $500 Million in OpenAI — The Information

https://www.theinformation.com/articles/softbank-to-invest-500-million-in-openai

“As usual, OpenAI failed to emphasize the real-game changer feature at their Dev Day: audio output from the standard generation API. This has severe implications for text-to-speech apps, particularly if the audio output style is as steerable as the gpt-4o voice demos.

As usual, OpenAI failed to emphasize the real-game changer feature at their Dev Day: audio output from the standard generation API.

This has severe implications for text-to-speech apps, particularly if the audio output style is as steerable as the gpt-4o voice demos. pic.twitter.com/K9odEULfbv
— Max Woolf (@minimaxir) October 1, 2024

“🚨 OpenAI just dropped a new open-source model 🚨 Whisper V3 Turbo is a new Whisper model with: – 8x faster relative speed vs Whisper Large – 4x faster than Medium – 2x faster than Small – 809M parameters – Full multilingual support – Minimal degradation in accuracy

🚨 OpenAI just dropped a new open-source model 🚨

Whisper V3 Turbo is a new Whisper model with:

– 8x faster relative speed vs Whisper Large
– 4x faster than Medium
– 2x faster than Small
– 809M parameters
– Full multilingual support
– Minimal degradation in accuracy pic.twitter.com/dMo7XV4r5Q
— Baseten (@basetenco) September 30, 2024

Open Source AI

Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4 | VentureBeat
https://venturebeat.com/ai/nvidia-just-dropped-a-bombshell-its-new-ai-model-is-open-massive-and-ready-to-rival-gpt-4/

“Nvdia’s NVLM-D 1.0 72B is 🔥 On par with Llama 3.1 405B on Math and Coding. Creative Commons Attribution Non Commercial 4.0 International 🚀 Model Overview: – Multimodal LLM family achieving SOTA on vision-language tasks – Competitive with GPT-4o and open models like Llama https://x.com/rohanpaul_ai/status/1841497232253890937

Perplexity

“⌘ + ⇧ + P — coming soon. Pre-order now: (this is cryptic code to tease their new Mac app coming soon)

⌘ + ⇧ + P — coming soon.

Pre-order now: https://t.co/Z9UF7og614 pic.twitter.com/S7sSTWO2KS
— Perplexity (@perplexity_ai) September 30, 2024

Publishing

Google’s AI search summaries officially have ads – The Verge

https://www.theverge.com/2024/10/3/24260637/googles-ai-overview-ads-launch

“What keeps newsroom leaders up at night?

What keeps newsroom leaders up at night? https://t.co/vw26jMGZxu
— Florent Daudens (@fdaudens) October 4, 2024

“NEW 🔥: Perplexity now can show extensive financial data for major stock tickers and not only! Balance sheets, income statements, cash flows and more 👀👀👀

NEW 🔥: Perplexity now can show extensive financial data for major stock tickers and not only!

Balance sheets, income statements, cash flows and more 👀👀👀 pic.twitter.com/iiCd0pVmbV
— TestingCatalog News 🗞 (@testingcatalog) September 30, 2024

Envisioning the Future of News: Launching the AI-Driven Journalism Startups of 2030

https://www.hackshackers.com/envisioning-the-future-of-news-launching-the-ai-driven-journalism-startups-of-2030-2

Microsoft brings AI-powered overviews to Bing | TechCrunch

Microsoft brings AI-powered overviews to Bing

Robotics and Embodiment

“I had the privilege of touring Figure’s Sunnyvale office yesterday, though calling it just an office would be a massive understatement. There’s a whole lot more happening under that roof than you’d expect. The first thing that struck me was how densely packed the space is.

I had the privilege of touring Figure's Sunnyvale office yesterday, though calling it just an office would be a massive understatement. There's a whole lot more happening under that roof than you'd expect.

The first thing that struck me was how densely packed the space is.… pic.twitter.com/36yNtJLIlY
— The Humanoid Hub (@TheHumanoidHub) October 2, 2024

Video

PIKA 1.5 IS HERE. With more realistic movement, big screen shots, and mind-blowing Pikaffects that break the laws of physics, there’s more to love about Pika than ever before. Try it. https://x.com/pika_labs/status/1841143349576941863