a sound waveform with the Google logo in the signal –ar 5:3 –v 6.0 –style raw
Gemini
“It can “hear” -> Google’s Gemini 1.5 Pro can now process audio w/out needing a transcript “The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.” https://twitter.com/glenngabe/status/1777746267667365998
Google’s Gemini 1.5 Pro can now hear – The Verge – https://www.theverge.com/2024/4/9/24124741/google-gemini-pro-imagen-updates-vertex
“I uploaded two music tracks to Gemini 1.5 Pro: Black Sea by Drexciya, and Song of Scheherazade by Renaissance, and asked it to analyse them https://twitter.com/sebkrier/status/1778386521319428167
“Can Gemini 1.5 actually read all the Harry Potter books at once? I tried it. All the books have ~1M words (1.6M tokens). Gemini fits about 5.7 books out of 7. I used it to generate a graph of the characters and it CRUSHED it. https://twitter.com/deedydas/status/1778621375592485076
“🎉 It’s a big day for @Google Gemini. Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode! It’s all 🆓. Here’s why it’s a big deal 👇 🔈 Gemini can hear Gemini understands audio (up to 9.5 https://twitter.com/liambolling/status/1777758743637483562
“Is it just me or was today the first time OpenAI was unable to overshadow a Google AI announcement? Gemini 1.5 Pro is pretty wild. Just dropped in an audio file + hour long video interview, and now it’s helping me package it up for YouTube. Multimodality + 1M context window https://twitter.com/bilawalsidhu/status/1777888008454193476
“Google Gemini 1.5 Pro’s 1,000,000+ token context length is not talked about enough. At launch, 1.5 Pro was overshadowed by OpenAI’s Sora and only accessible to select users. But now, anyone can try it for free. Here’s a quick tutorial 🧵: https://twitter.com/rowancheung/status/1777738501024346575
“Our next-generation AI model Gemini 1.5 Pro is now available in public preview on @GoogleCloud’s #VertexAI platform. Its long-context window is already helping businesses analyze large amounts of data, build AI-powered customer service agents & more. → https://twitter.com/GoogleDeepMind/status/1777738279137222894
“🎉 It’s a big day for @Google Gemini. Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode! It’s all 🆓. Here’s why it’s a big deal 👇 🔈 Gemini can hear Gemini understands audio (up to 9.5 https://twitter.com/liambolling/status/1777758743637483562
Gemini 1.5 Pro Now Available in 180+ Countries; With Native Audio Understanding, System Instructions, JSON Mode and More – Google for Developers – https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html
Gemini 1.5 Pro Now Available in 180+ Countries; With Native Audio Understanding, System Instructions, JSON Mode and More – Google for Developers – https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html
Google’s Gemini Pro 1.5 enters public preview on Vertex AI | TechCrunch – https://techcrunch.com/2024/04/09/googles-gemini-pro-1-5-enters-public-preview-on-vertex-ai/
DeepMind
“Our generative technology Imagen 2 can now create short, 4-second live images from a single prompt. 🖼 It’s available to use in @GoogleCloud’s #VertexAI platform. → https://twitter.com/GoogleDeepMind/status/1777747320945234422
“Soccer players have to master a range of dynamic skills, from turning and kicking to chasing a ball. How could robots do the same? ⚽ We trained our AI agents to demonstrate a range of agile behaviors using reinforcement learning. Here’s how. 🧵 https://twitter.com/GoogleDeepMind/status/1778377999202541642
Learning agile soccer skills for a bipedal robot with deep reinforcement learning | Science Robotics – https://www.science.org/doi/10.1126/scirobotics.adi8022
Other Google News
“BREAKING 🔥🤯 Google releases model with new Griffin architecture that outperforms transformers. Across multiple sizes, Griffin out performs the benchmark scores of transformers baseline in controlled tests in both the MMLU score across different parameter sizes as well as the https://twitter.com/rohanpaul_ai/status/1777747790564589844
Google Cloud Next – https://cloud.withgoogle.com/next
Google launches Code Assist, its latest challenger to GitHub’s Copilot | TechCrunch – https://techcrunch.com/2024/04/09/google-launches-code-assist-its-latest-challenger-to-githubs-copilot/
CodeGemma – an official Google release for code LLMs – https://huggingface.co/blog/codegemma
“Google announces Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key https://twitter.com/_akhaliq/status/1778234586599727285

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.
Be Sure To Read This Week’s Main Post:
This week’s executive overview and top links are here:
AI News #28: Week Ending 04/12/2024 with Executive Summary and Top 48 Links
The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.
- Agents/Copilots
- Amazon
- Apple
- Artificial General Intelligence (AGI)
- Augmented and Virtual Reality (AR/VR)
- Autonomous Vehicles
- AI Audio
- Business and Enterprise AI
- Chips and Hardware
- Consumer Products
- Education
- Ethics/Legal Security
- Images/Photos
- International AI News
- Locally Run AI Models
- Mobile
- Meta
- Microsoft
- OpenAI
- Open Source
- Podcasts/YouTube
- Publishing and News
- Robots and Embodiment
- Science and Medicine
- Video
- Vision/Multimodality
- X/Twitter/Grok
- Tech and Development
Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.
- Robert Scoble: https://x.com/Scobleizer
- Ethan Mollick: https://www.linkedin.com/in/emollick/
- Alan Thompson: https://lifearchitect.ai/
- Theoretically Media: https://www.youtube.com/@TheoreticallyMedia
- The Rundown: https://www.therundown.ai/
- Bilawal Sidhu: https://twitter.com/bilawalsidhu/
- TLDR: https://tldr.tech/ai
- Jeremiah Owyang: https://twitter.com/jowyang
- Nick St. Pierre: https://twitter.com/nickfloats
- Dr. Jim Fan: https://twitter.com/DrJimFan
- All About AI: https://www.youtube.com/@AllAboutAI
- Marshall Kirkpatrick: https://aitimetoimpact.com/
- AI News (Smol Talk): https://buttondown.email/ainews/archive/
For previous issues, please visit the archives!

Thanks for reading!





Leave a Reply