About This Week’s Cover Images

The main cover this week is inspired by Apple’s Worldwide Developers Conference that dominated AI headlines.  Pundits have been writing off Apple for months, saying the company missed the AI window.  However, Apple completely crushed it this week. They announced a local model that is trained for Apple tasks and can handle most UI/UX needs. Apple then announced a secure cloud-based model to assist with tougher tasks. Local models are contained, and the secure cloud is locked down. Apple is a major company with a boutique brand. To tie those together, this week’s cover is an Apple branded designer bag that is locked and secure. The MidJourney prompt was: “an open leather briefcase with the Apple logo. Golden light glows from inside it. –chaos 100 –ar 4:3 –style raw –personalize 9zxyhz8”. I upscaled the case with Magnific.AI. The colors behind the case are the theme from the WWDC event, and the font is Apple’s San Francisco brand font. 

This week’s category cover prompt celebrates Apple’s dominance.  “a sunny day with blue skies. an apple orchard. A [object] and a wooden sign that says “[category name]” –chaos 40 –ar 4:3 –style raw –personalize 9zxyhz8”  Notice the personalize parameter.  That’s a new option that uses your personal rankings of MidJourney images to create a tone and texture based on your preferences. Chaos is my new favorite parameter. It varies your results. 100 is pure chaos. I like 40 at the moment.

Executive Summary

The End of Search Engine Optimization Part 1: Re-captioning Billions of Web Images with LLaMA-3

Researchers used LLaMA-3 to re-caption 1.3 billion images releasing the dataset as open-source. By addressing the inherent noisiness of web-crawled image-text pairs, this effort aims to enhance model training for vision-language tasks, such as text-to-image generation.  However, I see it as the continued end of the need for text based metadata on web content.

The end of Search Engine Optimization Part II: Microsoft DenseAV Learns Sound, Language, and Context from Watching Videos Without Help

Imagine searching for anything in any video (on your phone or the internet), even if it’s not explicitly labeled or mentioned.  DenseAV understands sounds and language just by watching videos, without needing transcripts, labels, or additional information.  DenseAV can figure out the meanings of words and where sounds are coming from by analyzing the video and sound to learn their connections, for example knowing a dog’s bark means there’s a dog present in the video even if it’s not shown.  

Apple Unveils Advanced Siri and AI Integrations at WWDC

At WWDC, Apple revealed a revamped Siri capable of understanding icons, widgets, and text on iOS devices. Siri can operate within apps, offering contextual awareness without a user opening the app itself! The new “Apple Intelligence” can prioritize emails, summarize content, and act as a virtual executive assistant.  Apple also announced a partnership with OpenAI to integrate ChatGPT into iOS 18, iPadOS 18, and macOS. The integration of ChatGPT isn’t as deep as expected, as Apple confirmed they trained a variety of their own language and image models to run on the device and in the cloud.  See links below for incredible walk-throughs.

OpenAI Live Demos Incredible Multimodal Audio – Must See

OpenAI demonstrated GPT4’s still unreleased multimodal audio during New York Tech Week.  It supports real-time interaction, allowing natural feeling interruptions by both the user and the AI. Unlike traditional transcription-based systems, multimodal audio is signal based and never turned to text. The demo started with an AI friend giving an increasingly dramatic pep talk in a variety of dialects.  The second half showed the audio’s capability to switch seamlessly between English, Italian, and Mandarin translations in real-time.  It’s one of those must see demos that is actually a bit intimidating.  See links below.

Apple’s New Standard in Confidential Computing

Apple’s “Private Cloud Compute” faced initial skepticism, but it’s emerging as a groundbreaking innovation in confidential computing. Unlike typical cloud services, Apple’s system ensures absolute privacy: the company can’t access user data, and it isn’t retained, logged, or used for debugging. Even with physical access to a node, attackers can’t compromise the system, thanks to Apple’s custom-built server hardware. Apple openly invites security researchers to verify these claims, highlighting transparency. Notably, Apple’s cloud is so secure it can’t comply with law enforcement data requests.  Apple has set a new benchmark in deploying AI without compromising user privacy.  Links below.

TikTok Releases Depth Anything V2

Depth estimation is the process of determining the distance of objects from the camera, which is crucial for applications like augmented reality and robotics.  It’s worth watching the demo to be sure you understand depth.  Depth and segmentation are two important concepts to know if you want to intuitively see where robotics is heading.   Depth Anything V2, developed by TikTok, is a cutting-edge model for estimating depth from a single camera image. Trained on over 62 million real images and 595,000 synthetic images, the model offers significant enhancements, provides finer details, greater robustness, and is ten times faster and smaller than previous models. Links below.

OpenAI Adds Former NSA Chief to Board of Directors

OpenAI announced the addition of former NSA head and retired General Paul Nakasone to its board of directors and its newly formed Safety and Security Committee. This signals a sense of urgency around the development of artificial general intelligence (AGI) and national security. Nakasone, with extensive experience in cybersecurity and national security, is expected to enhance OpenAI’s safety protocols. OpenAI board chair Bret Taylor highlighted the importance of securely building and deploying AI innovations. Senator Mark Warner praised the decision, emphasizing Nakasone’s expertise in cybersecurity and election security, particularly in the context of competition with China. The move comes amidst ongoing debate about whether private companies can effectively keep AI secure.

Kling: Top Three AI Video Engine – Must see examples Below

Last week I covered a new video engine called Kling, introduced by Kuaishou.  This is the company behind China’s second most popular short video app, Kuaishou (branded Kwai internationally), with 400 million daily active users.  Kuaishou is just behind TikTok, which has 600 million daily active users.  Kling can transform text into video clips up to 2 minutes long at 1080p resolution, supporting various aspect ratios.  Since launching last week, incredible examples of the text-to-video have poured in.  I have selected my favorite dozen or so, below.  They are a must see.

Luma AI Unveils Dream Machine: AI Video Generator (especially good at animating memes)

Luma AI has launched Dream Machine, an AI tool that extends static images into dynamic videos. Unlike most AI video generators, Dream Machine is free and publicly accessible. Users can generate 5-second video clips with realistic motion and cinematography, transforming simple snapshots into engaging narratives. Dream Machine’s advanced transformer model is trained directly on videos, enabling it to produce physically accurate and consistent scenes. It is highly efficient, generating 120 frames in just 120 seconds.  My favorite examples are in the links below.

Pandora: Real-Time Language-Controlled Video Generation

Pandora is capable of generating videos of world states with real-time language control.  This means that unlike traditional text-to-video models, Pandora allows for free form requests and actions during video generation, enabling on-the-fly adjustments. “The woman sits up.  Waves her hand. Looks to the right.” This interactive capability also enhances robust reasoning and planning, fulfilling the promise of dynamic content creation. Pandora has to be able to predict alternative futures, offering diverse videos from the same initial state based on different inputs. It can simulate various domains, including indoor/outdoor, natural/urban, human/robot, and 2D/3D scenarios. Examples and links to the paper are below.

The 7 Must-Click Links Of The Week

Apple

OpenAI

  • (must see audio demo) “sharing a peek of the 4o demo we did at @openai’s New York tech week reception! met so many cool ny founders and builders – huge props to the team for making this happen!   A pump up friend that can be interrupted and interrupt you too.  Truly multimodal audio is live and not transcribing audio.  It’s a completely new concept that even I’m still digesting.  https://twitter.com/ilanbigio/status/1799513432741265687 
  • (must see audio demo) Real time multimodal audio dialog demo with translations switching between English, Italian and Mandarin.   https://twitter.com/ilanbigio/status/1799513619450806703

Microsoft’s Self-learning Video Context AI (the end of SEO)

  • “DenseAV is an algorithm capable of discovering the meaning of language and locations of sounds just by watching unlabeled videos. DenseAV is completely unsupervised and never sees text during its training. Learn more:

Top 137 Links of The Week

It was a busy week, with a lot of demo links. I reviewed over 500 to narrow these down.  Just pick the ones you think look cool.

Apple AI News: Week Ending 06/14/2024

“Apple just announced a ton of incredible AI developments at WWDC. The most impressive reveals: 

1. Using the iPad calculator as a notepad and getting real-time answers 

“2. Apple Intelligence: Apple’s first AI system is coming to the iPhone, iPad, and Mac 

“Everyone’s expecting a reborn Siri at WWDC today. Well, Apple already published a paper on it that disclosed way more details than what we expect from Apple. It’s called “Ferret-UI”, a multimodal vision-language model that understands icons, widgets, and text on iOS mobile 

“As always, Apple demonstrates their dominance in tech as they unveil the future of integrated generative AI services into their full stack of devices and software. Check out what a truly integrated AI assistant, system-wide, looks like with their new Siri: #WWDC 

“Apple and OpenAI partner up to directly integrate ChatGPT into iOS 18, iPasOS 18, and macOS 

“ChatGPT integration isn’t as deep as a I expected. Does this mean Apple trained their own language and image models for everything else they showcased?” / X (Short answer – YES)

“very happy to be partnering with apple to integrate chatgpt into their devices later this year! think you will really like it.” / X

“Siri can now take commands *within* apps 

“Apple AI on device is equivalent to other tiny models. Apple AI Server is equivalent to GPT-3.5ish Frontier models they are not, but will still give you supercharged Siri & okay-ish writing help. I don’t think this has much to do with high-end use cases. 

“Everyone gest an exec assistant with Siri Siri will summarize your emails, with intelligence summaries, (not just first lines on the email) Also time sensitive emails will be prioritized at the top (dinner tonight, flights) 

“Apple’s AI will rewrite your emails (standard and expected feature) and summarize emails. 

Private Cloud Compute

“Many people are mocking Apple’s “Private Cloud Compute” as any other cloud service with the same guarantees and privacy concerns. Please, read a bit before you make yourself a clown before everyone else. Some notes about Apple’s Private Cloud Compute:  

1. Apple can’t access your data, even if they want to. Under any circumstances. 

2. Your data is never retained, logged, or used for debugging.

3. An attacker can’t access a specific user’s data without compromising the entire system. Even if an attacker gets physical access to a node, they won’t be able to compromise the rest.

4. Security researchers have access to verify Apple’s privacy and security guarantees.

5. Apple is using custom-built server hardware using Apple silicon. They use the same Secure Enclave and Secure Boot they’ve used on the iPhone.

6. They created a new secure operating system with a narrow attack surface.

7. They replaced existing general-purpose cloud components that everyone uses (like remote shells and observability tools) with custom, more secure components.

There’s a ton of serious security research and engineering behind the Private Cloud Compute. Apple has been working on it for years, and the result looks really good.

“Ok I take it back. Apple’s ‘Private Cloud Computing’ actually takes ‘Confidential Computing’ to the next level. It’s SO secure that they can’t even comply with law enforcement requests. > No data retention (unlike every other cloud provider) > No privileged access (even Apple 

“Data aggregation with privacy: Apple can enable you to find your content across apps, for example finding your Drivers licence number and then filling a form for you. 

Apple’s AI promise: “Your data is never stored or made accessible to Apple” | Ars Technica

“How will Apple walk the fine line of massively deploying AI on phones and computers w/o compromising privacy? Good blog post explaining how Apple wants to walk this fine line 🧵” / X

“One thing that is clear when you talk to OpenAI & you could see in the Apple announcement today is that they are very comfortable with the idea that AI is able to figure stuff out. Lots of complex API connections & contingencies? Sure, the AI will (soon) be able to figure it out” / X

Apple Intelligence: every new AI feature coming to the iPhone and Mac – The Verge

Introducing Apple Intelligence for iPhone, iPad, and Mac – Apple

“Creators are gonna LOVE iPhone mirroring — so they can do all their posting remotely via native apps for X, Instagram, YT shorts etc which often have a lot more functionality than posting via the web client. 

“iPhone Mirroring! Now you can use your iPhone on your Mac, while using Mac Virtual Display on AVP, so you never have to take it off! I can finally drop my Studio Displays and Pro Display XDR! 

“iPadOS 18 Math Notes feature is WILD! 🔥🔥 No doubt about this. #WWDC24 

Apple is bringing RCS to the iPhone in iOS 18 – The Verge

“Apple’s reality distortion field is strong. It’s kinda wild that with “semantic index,” Apple is basically doing what Microsoft wants to do with AI recall + Copilot, and without any of the big brother backlash. Semantic index means all your private content (messages, emails,” / X

OpenAI Links

“OpenAI also added retired U.S. Army General and former NSA head Paul M. Nakasone to its Board of Directors today. Nakasone previous led the NSA from 2018 to 2023, and will help improve AI’s role in cybersecurity. 

OpenAI adds former NSA chief Paul Nakasone to the board

OpenAI annualized revenue doubles to hit $3.4B: report | Seeking Alpha

OpenAI welcomes Sarah Friar (CFO) and Kevin Weil (CPO) | OpenAI

“Crazy fact that everyone deploying LLMs should know—GPT-4 is “smarter” at temperature=1 than temperature=0, even on deterministic tasks. I honestly didn’t believe this myself until I tried it, but shows up clearly on our evals. ht to @eugeneyan for the tip! 

“One thing that is clear when you talk to OpenAI & you could see in the Apple announcement today is that they are very comfortable with the idea that AI is able to figure stuff out. Lots of complex API connections & contingencies? Sure, the AI will (soon) be able to figure it out” / X

Video Links

Kling/KWAI

“Less than 48 hours ago, Sora competitor Kling dropped. People are already getting access and creating wild AI videos. 🤯 1. MadMax Beer commercial made in 1 hour 

“2. Golden retriever puppy bouncing through tall grass. 

“3. People eating melons and watermelons 

“4. Eating tacos at the beach 

“5. a shoebill stork on the red carpet, at a movie premiere 

“6. A starfighter walks through an alien jungle, surreal, sci-fi movie, 8K 

“7. The huge shark in the water swam over and opened its mouth wide 

“8. Man touching wheat 

“9. A young man is skiing, fresh snow covering the woods, winter sunlight, light through the branches, realistic photo, rich details 

“10. An adorable otter with a happy atmosphere surrounded by splashing water and floating twinkle twinkle little stars. 

“Both videos are AI generated, one in 2023 and the other in 2024. This is how rapid the progress in AI is! 

Luma

“wow. The new model from @LumaLabsAI extending images into videos is really something else. I understood intuitively that this would become possible very soon, but it’s still something else to see it and think through future iterations of. A few more examples around, e.g. the” / X

“Luma AI just dropped a Sora-like AI video generator called Dream Machine. But unlike Sora or KLING, it’s completely open access to the public. Here are 10 wild examples (and how to access it): 1. 

“Luma’s ‘dream machine’ is a much needed step up in AI video tools. No more ‘Ken Burns’ style clips with minimal movement – we’re talking real, dynamic action. Still craving control tho. Like most image-to-video AI, its kinda like working with a chaotic golden retriever you’re 

Luma Dream Machine

“Here are some stand alone shots generated with @LumaLabsAI #LumaDreamMachine 

More examples of Luma

Animating the meme of the boyfriend looking back at another woman
“I’ve been wanting to get to the bottom of this story for so long. #DreamMachine #LumaAI 😂 

Animating the meme of the kid in front of the fire
“What a thrill to use #DreamMachine to animate even the most collectors memes 😊 

Another model that is amazing to watch real time language modification (aka give commands to the subject in the video, and the person responds!)
“🔥Introducing Pandora 🌏 🪐 a World Model that generates videos of world states with real-time language control 🎥🕹️ Simulate the world across domains in an _interactive_ way! check out more 

Publishing Links:  

Apple brings free call recording and transcription to iPhones; journalists rejoice | Nieman Journalism Lab

“Today we’re announcing that @Particle_news has raised $10.9mm in Series A funding, led by @mignano at @LightspeedVP, fueling our mission to deliver the best, personalized news experience. 

SEO is dead: AI will rescan the internet and make duplicate content worthless. – Ethan B. Holland – https://ethanbholland.com/2024/07/05/seo-is-dead-ai-will-rescan-the-internet-and-make-duplicate-content-worthless/

“What If We Recaption Billions of Web Images with LLaMA-3 ? – Finetunes a LLaVA-1.5 and recaptions ~1.3B images from the DataComp-1B dataset – Opensources the resulting dataset data: 

“”What If We Recaption Billions of Web Images with LLaMA-3?”🤯 And the results confirm that this enhanced dataset, Recap-DataComp-1B generated this way, offers substantial benefits in training advanced vision-language models. For discriminative models like CLIP, we observe 

Perplexity was planning revenue-sharing deals with publishers | Semafor

“Perplexity has been the #2 referral source for Forbes (behind only Wikipedia) and the top referrer for other publishers. We have been working on new publisher engagement products and ways to align long-term incentives with media companies that will be announced soon. Stay tuned! 

AI Startup Perplexity Is Directly Ripping Off Content From News Outlets

“Just done something with base level ChatGPT to reversion a video interview into a coherent article and it’s frightenly good. A few steps, but within 10 minutes the newsroom content miracle happens. This stuff works.” / X

“Very basic but very useful stuff. A news broadcast interview put through @Grabyo then @TrintHQ and @ChatGPTapp to create a coherent article that needs minimal editing. This should be basic newsroom routine by now (but it isn’t)” / X

Audio Links:  

Suno

“Dog noises into a lo-fi beat to relax/study to. Awoo’d by Calvin (Pup of Ian’s, Suno Design Lead) #SunoPets #SunoAudioInput 

“We are excited to release our Audio Input feature, where you can make a song from any sound! 🎉 All Pro & Premier users can now upload or record their own audios to make any song you can dream of. We’ve already seen creators find musical inspiration from street sounds and jam 

Augmented and Virtual Reality (AR/VR) Links:

Other AR/VR News

“A remarkable example of human’s extraordinary ability to do “cross-embodiment transfer”: the person steers himself in 3rd person “NPC view” by streaming the camera feed from a drone following him to his VR headset. We are born to process egocentric view, yet we have no trouble 

“3D capture is dope for delight, but the utility might be even more impactful. This digital twin of Boston fuses together aerial 3d scans + vector maps + building (BIM) data + historical crime data + utility waterlines + zoning data. You can answer questions like: 1. Which 

Multimodality Links:

“Depth Anything V2 This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more 

“TikTok presents Depth Anything V2 Trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation model proj: 

Agents and Copilots Links:

“This is a great little example of how simple agent-based systems can lead to emergent behavior, even with tiny AIs like Apple’s on-device LLM. Matt Webb built a demo AI smart home, when he asks it “turn on the light for my dog” the home figures out how. 

“Synthflow, an AI startup building conversational AI agents a new multilingual update to the platform. AI call assistants can now communicate in nearly any language, in any voice 

“Today we’re announcing the $6M seed round for @brightwaveio led by @DecibelVC and with participation from @p72vc and @Moonfire_VC. We are building an AI research assistant that generates insightful, trustworthy financial analysis on any subject and have customers with assets 

Fueling the Future: CaseMark Raises $1.7M to Empower Attorneys with AI

How to Use AI to Create Role-Play Scenarios for Your Students | Harvard Business Publishing Education

Michael Kors first to debut Shopping Muse, the AI-powered shopping assistant from Dynamic Yield by Mastercard | Mastercard Newsroom

LinkedIn leans on AI to do the work of job hunting | TechCrunch

How Amazon blew Alexa’s shot to dominate AI, according to employees who worked on it | Fortune

“Zeta Labs came out of stealth, a new AI agent startup built by former Meta Engineers. The startup unveiled ‘Jace,’ an AI agent capable of executing complex browser tasks autonomously, similar to CognitionLabs’ Devin. 

“Today we’re thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites 

Morgan Stanley CEO says AI could save financial advisers 10-15 hours a week | Reuters

Anthropic Links:

Claude’s Character \ Anthropic

Artificial General Intelligence (AGI) Links:

“Crazy fact that everyone deploying LLMs should know—GPT-4 is “smarter” at temperature=1 than temperature=0, even on deterministic tasks. I honestly didn’t believe this myself until I tried it, but shows up clearly on our evals. ht to @eugeneyan for the tip! 

Business and Enterprise Links:

Introducing Shutterstock ImageAI, Powered by Databricks: An Image Generation Model Built for the Enterprise – Databricks

Chips, Hardware, and Infrastructure Links:

Nemotron-4 340B | Research

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows the distribution, modification, and use of the models and their outputs. 

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models | NVIDIA Blog

“Nvidia presents HelpSteer2 Open-source dataset for training top-performing reward models High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human 

Education AI News Links:

Dynamic math solutions via multimodality is incredible to watch

Ethics/Legal/Security Links:

OpenAI insider stock sales are raising concern among ex-employees

Electoral Commission denies Newcastle firm’s attempt to stand AI candidate for PM – Prolific North

Uncensor any LLM with abliteration

“An artificial intelligence start-up says it has found thousands of vulnerabilities in popular generative AI programs and released a list of its discoveries. 

Stealing everything you’ve ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster. | by Kevin Beaumont | May, 2024 | DoublePulsar

“All of our security and verification systems are built around very specific intuitions about things humans can do that machines cannot (& we are still fooled lot). We are very unready for how LLMs will overwhelm these systems. This example is one of many. 

“We should eliminate electronic voting machines. The risk of being hacked by humans or AI, while small, is still too high.” / X

“You put him there because an employee rightfully pointed out that your opsec sucks, and you decide that if you’re going to become nation state’s target you’re going to need to build a nation state’s defenses. You ask the NSA whether they can protect you and they tell you” / X

AI candidate running for Parliament in the U.K. says AI can humanize politics

Yes, artificial intelligence is running for mayor of Cheyenne; city, county clerks comment on candidate VIC – Casper, WY Oil City News

“The US is going to lose its leadership in AI if it doesn’t support more open research and open-source AI!” / X

“GPT-4 creates persuasive political ads: “messages generated by GPT-4 were broadly persuasive, in some cases increasing support for an issue stance by up to 12 percentage points.” But micro targeting with user data had no effect. Why? 🤷 But I suspect better prompting would work. 

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference – Apple Machine Learning Research

Elon Musk drops suit against OpenAI and Sam Altman

Elon Musk withdraws lawsuit against OpenAI | Reuters

Google Links:

“With @Harvard, we built a ‘virtual rodent’ powered by AI to help us better understand how the brain controls movement. 🧠 With deep RL, it learned to operate a biomechanically accurate rat model – allowing us to compare real & virtual neural activity. → 

Imagery Links:

“Sharing an update on changes and clarifications coming to Adobe’s Terms of Use to address customer concerns. As technology evolves, so must our Terms of Use. Learn more: 

“A paper origami owl, a felt robot, a skater made of clay… here’s how Imagen 3 can generate visually rich images with complex textures. 👀 

“Midjourney default image quality with style personalization turned on is significantly better 

“Midjourney just released a new feature called ‘model personalization’ It lets you tune the MJ algorithm to your own personal tastes, removing much of the MJ “bias” that comes from its training data Breakdown of how it works: 

“interesting that 99% of AI videos made with tools like Luma, Pika, and runway are generated from images originally created in Midjourney 😏” / X

“Stability AI released the open model weights for Stable Diffusion 3 Medium. The 2B parameter text-to-image model offers “advanced photorealism, prompt understanding, and typography capabilities” 

Stable Diffusion 3 Medium — Stability AI

Locally Run Links:

“Everyone’s expecting a reborn Siri at WWDC today. Well, Apple already published a paper on it that disclosed way more details than what we expect from Apple. It’s called “Ferret-UI”, a multimodal vision-language model that understands icons, widgets, and text on iOS mobile 

“Love how @Apple is advocating for on-device AI at WWDC . Local, smaller, specialized models are the future of private, secure and efficient AI.” / X

“🚀 Chat with MLX 0.2 is here – a whole new LLM experience on your Apple Silicon Mac! 🍎 – Revamped UI/UX – Fully-featured Chat UI – Chat, Completion, Model Manager – Better and Faster RAG Upgrade your AI conversations now! 💻 GitHub – 

Mobile News Links:

“NOTHING PHONE CEO: “In the Future There Will Be No More Apps” In the Nothing YouTube channel, CEO Carl Pei revealed his groundbreaking vision for the future of AI in smartphones. With over 4 billion users and 1 billion smartphones shipped annually, Pei emphasized the need for a 

Open Source Links:

Nemotron-4 340B | Research

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows the distribution, modification, and use of the models and their outputs. 

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models | NVIDIA Blog

“Nvidia presents HelpSteer2 Open-source dataset for training top-performing reward models High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human 

“Alibaba’s open-source Qwen 2-72B model moved into the top spot on Hugging Face Open LLM Leaderboard This ranks the model ahead of Mixtral and Llama-3 across a range of benchmarks Pretty wild to continue to see the pace of open-source AI 

“Stability AI released the open model weights for Stable Diffusion 3 Medium. The 2B parameter text-to-image model offers “advanced photorealism, prompt understanding, and typography capabilities” 

Stable Diffusion 3 Medium — Stability AI

Podcasts/YouTube/Op-Eds Links:

Karpathy recreates GPT-2 in four hours on YouTube

“📽️ New 4 hour (lol) video lecture on YouTube: “Let’s reproduce GPT-2 (124M)” 

“New simulation hypothesis drop. Maybe the simulation is not physical and exact but neural and approximate. i.e. not about simulating fields or particles with physical equations but a giant Diffusion Transformer++ creating a large “dream”.” / X

Robotics and Embodiment Links:

Train a prosthetic limb by thinking about moving your phantom limb, until it works.

“AI and robotics is on the verge of transforming prosthetic limbs. Watch @AtomLimbs’ tech in action. 

Tesla claims it has 2 Optimus humanoid robots working autonomously in factory | Electrek

Science and Medicine Links:

Train a prosthetic limb by thinking about moving your phantom limb, until it works.

“AI and robotics is on the verge of transforming prosthetic limbs. Watch @AtomLimbs’ tech in action. 

Twitter/X/Grok Links:

Elon Musk reconsiders phone project after Apple Intelligence OpenAI integration

“If Apple integrates OpenAI at the OS level, then Apple devices will be banned at my companies. That is an unacceptable security violation.” / X

OpenAI CTO Responds to Elon Musk Calling Apple Partnership ‘Creepy Spyware’ – YouTube

“New code was spotted within Grok’s code over the weekend If accurate, then xAI is probably planning to integrate image generation within Grok soon. Back in February, Elon also teased a potential Midjourney partnership: 

“NEWS: xAI might be working on image generation 👀 

https://twitter.com/xDaily/status/179931938268816595

The Rest: AI News of The Week

Don’t let the volume overwhelm you.  Have fun and skim these. The links are organized by topic, sorted from ‘coolest’ to ‘least cool’, and each topic is clearly defined with a headline.  I’ve added a description and glossary of what the topics mean, beneath each label, in plain language.  I do the work so you don’t have to!   When you visit the pages, note that the links and descriptions are often pulled directly from tweets or articles, so it’s not always my voice.  Pause when you see something that interests you.  Reach out to me any time. I enjoy sharing and discussing these items.

Agency/Agents/Copilots News of the Week: Agency is when AI can do things for you (like Googling an actress name or fetching the latest weather forecast). An agent is one step further, when AI given autonomy to take action on your behalf (“Alexa, book a reservation for three at Peak in Hudson Yards for Friday night”). A co-pilot is an assistant (like spell check or autofill).
This week’s latest agent news: https://ethanbholland.com/2024/06/14/agents-and-copilots-ai-news-week-ending-06-14-2024/

Amazon News of The Week: Individual company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This week’s latest Amazon AI news: https://ethanbholland.com/2024/06/14/amazon-ai-news-week-ending-06-14-2024/

Anthropic News of the Week:
Anthropic is a company that builds LLMs like OpenAI, Mistral, Meta, etc. Their main AI brand is Claude. As with Amazon and Apple, individual Anthropic company posts will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This week’s Anthropic news: https://ethanbholland.com/2024/06/14/anthropic-news-week-ending-06-14-2024/

Apple News of the Week: As with Amazon, individual Apple company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad or a major product release.
This weeks’ latest Apple AI news: https://ethanbholland.com/2024/06/14/apple-ai-news-week-ending-06-14-2024/

Artificial General Intelligence (AGI) News of the Week: Artificial General Intelligence, in a nutshell, is when artificial intelligence is able to beat humans at everything (including embodying physical forms and completing physical tasks).  It’s usually a thought catalyst for predictions, like when AGI will occur. 10 years? 25 years? 100? AGI is an event horizon that is tough to define, tough to imagine, and tough to predict. OpenAI defined AGI in its charter as “highly autonomous systems that outperform humans at most economically valuable work”. OpenAI has a section of its website dedicated to AGI. Google’s DeepMind published my favorite report on the five levels of artificial intelligence on the way to AGI (see also here).
This week’s latest Artificial General Intelligence (AGI) news: https://ethanbholland.com/2024/06/14/artificial-general-intelligence-agi-news-week-ending-06-14-2024/

AI Audio News of the Week: In this case, AI audio can mean a few things. The first is “generative audio” which refers to creating sounds with AI, much like ChatGPT writes words or MidJourney creates images. For example, asking for the “sound of waves crashing on the beach” would be text to sound. Another example would be an AI ‘watching’ a video and adding sound to it, like a foley artist would add footsteps or a creaking door to a movie scene. Lastly, AI audio can refer to microphones that only pick up certain speaker’s voices or headsets that cancel out all voices but your friends. This week’s latest AI audio news: https://ethanbholland.com/2024/06/14/audio-news-week-ending-06-14-2024/

Autonomous Vehicles/Driverless Cars News of the Week: Driverless car news doesn’t always get its own category, because it’s so close to robot embodiment. I go with my gut each week around what to place in each category. My recommendation would be to follow Robotics/Embodiment also, as the two fields are converging.
This week’s autonomous vehicle news: https://ethanbholland.com/2024/06/14/autonomous-vehicles-news-week-ending-06-14-2024/

Augmented and Virtual Reality (AR/VR) News of the Week: Augmented reality is when you see images or information on top of the real world.  A car windshield with a heads-up display of the speed. Or glasses that have facial recognition and overlay the names of everyone in view. Virtual reality is when you are transported into another place, usually wearing goggles, but a flight simulator could also be considered virtual reality.
This week’s latest AR/VR news: https://ethanbholland.com/2024/06/14/augmented-and-virtual-reality-ar-vr-news-week-ending-06-14-2024/

Business/Enterprise News of the Week: This broad category is for stories that impact corporations and large scale AI implementation. Enterprise refers to a type of AI that is often custom built for a business or leverage an API to connect secure data to an AI model. 
This week’s latest enterprise AI news: https://ethanbholland.com/2024/06/14/business-and-enterprise-ai-news-week-ending-06-14-2024/

Chips and Hardware AI News of the Week: Most of the chip news is NVIDA usually, yet more and more Meta, Google, and OpenAI are starting toward their own manufacturing. I have to make the call whether to put Meta, Google, and OpenAI’s chip news under this section or their company sections. Lately, I’m putting each company’s chips news into the company category, rather than the chips category. This is the rest of the chips headlines.
This week’s latest chips and hardware news: https://ethanbholland.com/2024/06/14/chips-hardware-and-infrastructure-week-ending-06-14-2024/

Education AI News of the Week: There is a lot of buzz around the impact of AI in education. This section focuses both on the risks and rewards of how AI can impact learning. It’s broader than just K-12 and includes things like skills, trade, professional, and higher education. This is not about how to learn AI, it’s about AI’s impact on learning.
This week’s latest education news: https://ethanbholland.com/2024/06/14/education-ai-news-week-ending-06-14-2024/

Ethics/Legal/Security AI News of the Week: This section focuses on the impact AI is having on ethics (deep fakes, war, trust, false information, plagiarism, job loss, income), legal (rights, laws, regulations), and security (hacking, phishing, national interests, safety). For huge news stories like the NY Times suing OpenAI, I usually put them under the main section or give them their own page.
This week’s latest AI ethics/legal/security news: https://ethanbholland.com/2024/06/14/ethics-legal-security-ai-news-week-ending-06-14-2024/

Google AI News of the Week: Individual company products will often be placed in the categories they match (image, audio, agents, robots, etc). Occasionally, I’ll dedicate space to a company’s news if it’s broad
This week’s latest Google AI news: https://ethanbholland.com/2024/06/14/google-ai-news-week-ending-06-14-2024/

Imagery News of the Week: AI imagery covers “generative AI” image tools. This usually text-to-image, where a user enters a prompt (“a polar bear walking through NYC”) and a tool like Dalle or MidJourney generates an image in the likeness of the description. This is different than AI vision, where an AI “looks at” an image and can derive context, details, and contents. AI vision is a subset of AI called multimodality. Imagery, in this case, is for image creation and modification/editing. Adobe Photoshop’s AI tools would fall into this category. I’ll also include things like automatic masking and object removal, even though that’s in between imagery and vision… but practically speaking it fits into editing.
This week’s latest AI image news: https://ethanbholland.com/2024/06/14/imagery-news-week-ending-06-14-2024/

International AI News of the Week: A lot of international news will get cross listed in the chips, security, or open-source categories, however it’s nice to have a separate category for worldwide AI news.
This week’s latest international AI news: https://ethanbholland.com/2024/06/14/international-ai-news-week-ending-06-14-2024/

Locally Run AI Models News of the Week: This is a niche mostly for serious AI followers. It refers to AI that can be privately downloaded and run on a device without an internet connection. These have an array of powerful implications, from ethics of rogue users with untethered agents, to practical uses like Apple running a full AI on your phone, to corporate installations for security, to embodied robots with AI running in their virtual brain.
This week’s latest locally run AI news: https://ethanbholland.com/2024/06/14/locally-run-ai-models-news-week-ending-06-14-2024/

Meta AI News of the WeekThis is a space dedicated for Meta specific AI advancements and news stories.
This weeks Meta AI news: https://ethanbholland.com/2024/06/14/meta-ai-news-week-ending-06-14-2024/

Microsoft AI News of the WeekThis is a space dedicated for Microsoft specific AI advancements and news stories.
This weeks Microsoft AI news: https://ethanbholland.com/2024/06/14/microsoft-ai-news-week-ending-06-14-2024/

Mobile AI News of the Week: In April, 2024 I added a dedicated category for mobile. Prior, I put all most the mobile news into either the company (Apple v. Google v. Microsoft) or locally run AI. It also ended up in the chips and hardware section, or the consumer products category. There is enough mobile news to at least start cross linking it all in one place. This week’s latest mobile AI news: https://ethanbholland.com/2024/06/14/mobile-news-week-ending-06-14-2024/

Multimodal AI News of the Week: This is a broad topic for an single AI model that demonstrates an ability to interact with more than one modality (imagery, video, audio, text). Often multimodal news will end up in one of these categories. I’m playing it by ear on a case by case basis. Please be patient with my organizational challenges.
This week’s multimodal AI news: https://ethanbholland.com/2024/06/14/multimodality-news-week-ending-06-14-2024/

OpenAI: OpenAI is the leading force in the AI boom of 2023 and now 2024. This section focuses on news that is specific to OpenAI. This section will compete with all of the other sections (imagery, vision, ethics, etc) because OpenAI is so broad. I won’t be able to consistently pick when to put things under OpenAI or other sections, so bear with me.
This week’s latest OpenAI news: https://ethanbholland.com/2024/06/14/openai-news-week-ending-06-14-2024/

Open Source Models: An open source AI model refers to a class of artificial intelligence models with public source code. They can be inspected, copied, installed, and customized on private computers. In contrast, a closed source model is proprietary and owned by a company that you pay to use (like PowerPoint or Photoshop). One of the most famous open source language models is a French model called Mistral. Its code is completely publicly available, and anyone can download it and customize it. On one hand, open source is a transparent and powerful way to democratize AI, but on the other hand, open source models circumvent the guard rails and copyright protections that private companies implement. Open source models are the wild west of artificial intelligence, but also the potential saving grace (depending on who you ask). It’s a bit like gun control debates but for computing power.
This week’s latest open source news: https://ethanbholland.com/2024/06/14/open-source-ai-news-week-ending-06-14-2024/

Perplexity News of the Week:
Perplexity is renowned for its advanced search and information retrieval technologies. In 2024, they introduced “Perplexity Pages,” a tool transforming AI-driven research into detailed, shareable web pages. However, in 2024, the company also faced allegations of content theft, with claims that its AI-generated articles improperly replicate work from other sources. This week’s latest Perplexity news: https://ethanbholland.com/2024/06/14/perplexity-news-week-ending-06-14-2024/

Podcast/YouTube Clips of the Week: This is for more general interviews and explainer videos and podcasts that provide access to leadership, demos of new products, and walkthroughs and tutorials. Videos focused on specific topics will live in the topic category (i.e. images), but broader videos will live here.
This week’s latest podcasts and YouTube clips: https://ethanbholland.com/2024/06/14/podcasts-youtube-op-eds-week-ending-06-14-2024/

Publishing AI News of the Week: These are stories about AI’s impact on the publishing industry. From copyright and crawling to the death of page views or even the end of browsers.
This week’s latest publishing AI news: https://ethanbholland.com/2024/06/14/publishing-news-week-ending-06-14-2024/

RAG Retrieval-Augmented Generation News of the Week: RAG allows a language model to “reference an authoritative knowledge base outside of its training data sources before generating a response” (via Amazon). Historically RAG was prone to hallucinations, however new methods are improving the reliability. There is enough news about RAG, that I want to start tracking it separately for my own use.
This week’s latest RAG (Retrieval-Augmented Generation) AI news: https://ethanbholland.com/2024/06/14/rag-retrieval-augmented-generation-news-week-ending-06-14-2024/

Robotics/Embodiment News of the Week: This is the most intense area of AI. Embodiment refers to putting an AI inside of a machine. It’s “embodying” the object and therefore giving a robot agency in the real world. An example would be using a large language model as an interface to a complex coding task. Just as you ask “Alexa, play Bad Blood by Taylor Swift on Spotify” using plain language, with embodiment you could ask a robot to “Go to the laundry basket and bring me all of the red shirts”. The language model in the robot would translate your request into the proper code to go get the red shirts. The robot was never trained on the task. Another type of embodiment would be training a robot using virtual reality simulations. Using an simulation, a robot could be trained on thousands of scenarios until the real world can be swapped out and the robot doesn’t “notice”. This section also includes factory automation and human prosthetics. There will be some overlap with other categories like autonomous vehicles. I first learned about embodiment from Alan Thompson. I highly recommend his video explainer: https://youtu.be/peLqYP9BAUg?si=2FzrvDlw-qaQFaCx.
This week’s latest robot and embodiment AI news: https://ethanbholland.com/2024/06/14/robotics-and-embodiment-news-week-ending-06-14-2024/

Science/Medicine AI News of the Week: AI’s strength is learning patterns. This applies nicely to medical diagnosis and identifying trends. When combined with data and AI vision, this means AI is good at looking at x-rays. Language models are helping with patient interface, and robotics and augmented reality are advancing surgery. Powerful enterprise models like Google’s Alphafold can master protein folding. Other models can read ancient scrolls without opening them.
This week’s latest AI science and medicine news: https://ethanbholland.com/2024/06/14/science-and-medicine-news-week-ending-06-14-2024/

AI Video News of the Week: AI video in this case refers to generative video. Much like imagery meant generative imagery. This usually text-to-video, where a user enters a prompt (“a wizard walking out of a flaming building”) and a tool like Pika or Runway generates an video in the likeness of the description. It also covers animation of still images, where an image is given motion (like a photo of a waterfall appearing to have flowing water). As with images, this is different than AI vision, where an AI “looks at” an image or video and can derive context, details, and contents. Video, in this case, is video creation and modification/editing.
This week’s latest AI video news: https://ethanbholland.com/2024/06/14/video-news-week-ending-06-14-2024/

X/Twitter/Grok: Grok is one of several AI’s developed by X, and it’s a bit blended in with Telsa and other Elon Musk technology. Not every week will have a Grok section, but like Meta, Google, Apple, and OpenAI, X will be in the news enough to have its own section.
This week’s latest X news: https://ethanbholland.com/2024/06/14/twitter-x-grok-week-ending-06-14-2024/

Technical and AI Developer News of the Week: Everything that is too technical for general consumption goes here. These are stories I think are important, but might be inaccessible and confusing. It’s also a space for developer news and deep dives into how AI works, under the hood.
This week’s technical and dev AI news: https://ethanbholland.com/2024/06/14/tech-papers-training-and-development-week-ending-06-14-2024/

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.

For previous issues, please visit the archives!

Thanks for reading!

23 responses to “AI News #37: Week Ending 06/14/2024 with Executive Summary, Must-Read Seven Stories, and 134 Top Links”

  1. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  2. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  3. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  4. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  5. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  6. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  7. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  8. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  9. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  10. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  11. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  12. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  13. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  14. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  15. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  16. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  17. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  18. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  19. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  20. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  21. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  22. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

  23. […] This week’s executive overview and top links are here:AI News #37: Week Ending 06/14/2024 with Executive Summary and Top 7 Must-Read Links […]

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading