a movie theater in a forest with a trail sign that reads “Multimodal” –ar 5:3 –style raw
This week’s category cover theme is a sign in a forest. Each category image prompt is a derivative of the formula “an [category themed object] in a forest with a trail sign that reads “[category name]”. Using a theme each week takes the cover creation time down to about 20 minutes, rather than several hours.
Blog – DeepDataSpace | Unleashing the Power of Cutting-Edge Computer Vision Technology
“What makes up the abstract concept of an apple? We read the word “apple” as a string, see 2D pictures online, 3D shape in real life, and moving apples in videos. We touch the apple, feel its geometry in our palms and texture through the rich tactile sensation on our fingers. Do all these different modalities converge to the same representation space, given sufficient learning capacity? After all, they are all shadows of one “true reality” projected onto our different senses. I really like this study paper from MIT, “The Platonic Representation Hypothesis”. The authors show that highly capable LLMs and vision models learn very similar representations, even though the modalities are never explicitly co-trained. Concretely, the experiments compare the similarity between strings “apple” and “orange” with the similarity between a picture of an apple vs orange. These two turn out to agree with each other in a wide selection of off-the-shelf models.
“The PaliGemma vision-language model is included as part of the latest KerasNLP release! Works with JAX, TF, and torch. There’s a lot you can do with it: describing images, captioning, object detection and image segmentation, OCR, visual question answering… it even has” / X
OpenAI
“This is a big leap forward to doing real data analysis with GPT-4o. It still can’t quite handle the way many people use spreadsheets (Excel formulas, etc.) but it is pretty impressive. You can zoom into spreadsheets and ask about cells, modify graphs, etc. I trimmed 45 seconds.
PaliGemma: Open Source Multimodal Model by Google
Grok
Elon Musk’s xAI is working on making Grok multimodal – The Verge
Phi
“Phi-3-vision with 4.2B parameters
Segmentation
“In honor of the playoffs, I’d like to showcase what we’ve been working on here at Nexavision — a new way to generate basketball analytics through tracking with computer vision and AI: 🧵 https://twitter.com/AmarSVS/status/1793037268690579787

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.
Be Sure To Read This Week’s Main Post:
This week’s executive overview and top links are here:
AI News #34: Week Ending 05/24/2024 with Executive Summary and Top 47 Links
The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.
- Agents/Copilots
- Amazon
- Apple
- Artificial General Intelligence (AGI)
- Augmented and Virtual Reality (AR/VR)
- Autonomous Vehicles
- AI Audio
- Business and Enterprise AI
- Chips and Hardware
- Consumer Products
- Education
- Ethics/Legal Security
- Images/Photos
- International AI News
- Locally Run AI Models
- Mobile
- Meta
- Microsoft
- OpenAI
- Open Source
- Podcasts/YouTube
- Publishing and News
- Retrieval-Augmented Generation (RAG) News
- Robots and Embodiment
- Science and Medicine
- Video
- Vision/Multimodality
- X/Twitter/Grok
- Tech and Development
Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive/
For previous issues, please visit the archives!

Thanks for reading!





Leave a Reply