Ethan B. Holland

Over 56,600 manually organized AI links and counting

Audio: AI News Week Ending 05/15/2026

May 15, 2026

Claude’s Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at
https://x.com/AnthropicAI/status/2053881827396653207

You can now listen to me and Joe read out Claude’s constitution as an audiobook. Working on adding the option of listening to it on fast mode 🙂
https://x.com/AmandaAskell/status/2054010971765805486

Meta announced Muse Spark in Voice Mode and Meta Glasses
https://www.testingcatalog.com/meta-to-release-muse-spark-in-voice-mode-and-meta-glasses/

Today we’re introducing Meta AI Voice Conversations powered by Muse Spark that let you talk naturally to Meta AI (interrupt, switch topics, or swap languages), and as you talk, Meta AI can generate images and pull up recommendations from Reels, maps, and more. We’re also bringing
https://x.com/MetaNewsroom/status/2054205287515484397

we launched some muse spark updates yesterday, including muse spark voice and live AI w your camera in Meta AI app + muse spark rolling out to glasses 😎 check them out!
https://x.com/alexandr_wang/status/2054588354914832439

Seeing the demos come together over the last week has been awesome — so many things that previously required a special-purpose model (e.g. real-time translation, event detection in video) turn out to be zero-shot instruction following once you have a general-purpose model with
https://x.com/johnschulman2/status/2053940940885332028

Interaction Models: A Scalable Approach to Human-AI Collaboration – Thinking Machines Lab
https://thinkingmachines.ai/blog/interaction-models/

People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.
https://x.com/thinkymachines/status/2053938892152435174

Sharing our work on full-duplex multimodal models — real-time interaction that’s natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to
https://x.com/johnschulman2/status/2053940452789981426

thinking machines is using SGLang btw
https://x.com/eliebakouch/status/2053982248253190180

Thinking Machines know how to surprise. Those simultaneous abilities (not only translation but also creating graph while replying to a question) are pretty remarkable. Can’t wait to try it out and also learn how much it costs to use
https://x.com/TheTuringPost/status/2053975565179253010

Thinking Machines on X: “People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku https://t.co/uxl1InS6Ay” / X
https://x.com/thinkymachines/status/2053938892152435174

Thinky’s secret plan: 1: Increase Human<->AI bandwidth 2: Raise ceiling of human+AI intelligence 3: Help humans continue as main-characters in the new world We are at Step 1. Interaction Models are great real-time collaborative tools for humans. Here’s a preview:
https://x.com/soumithchintala/status/2053940215505645938

Very cool announcement from Thinky! The model looks nice (they go into some reasonable amount of detail), and reading some parts of the blog you can definitely see that the infea guys had a lot of fun there!
https://x.com/giffmana/status/2053953584300003405

GPT-Realtime-2 for instantly translating audio in realtime
https://x.com/gdb/status/2053134883040514350

gpt-realtime-2 is a great voice model (with a typically bad OpenAI name). Voice models are natively processing speech, not transcribing it, so the intelligence of the model matters. The old voice model was GPT-4o level, this is much smarter (how smart? OpenAI gave no benchmarks)
https://x.com/emollick/status/2053998691040583882

have been excited for realtime voice-to-voice translation as an AI application since we started OpenAI. extremely cool to see it now available in the API for anyone to build with:
https://x.com/gdb/status/2052480998668206262

people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)
https://x.com/sama/status/2052462271667028211

You can now just build amazing voice agents, with the GPT-Realtime-2 reasoning model in our API:
https://x.com/gdb/status/2052448850796011931

Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models
https://x.com/ArtificialAnlys/status/2054234919887573292

Every voice release since 2024 has acted like it’s finally building “”Her””. But where are we really, and what will it take to get there? @aiDotEngineer
https://x.com/neilzegh/status/2053945753073074484?s=20

Why Pipelines won’t build Voice AGI — Neil Zeghidour, CEO, Gradium AI – YouTube

Granola — The AI Notepad for back-to-back meetings
https://www.granola.ai/?via=adops-tldr-tech&dub_id=zrB2iDoskHcSooiw

Latest spogo (Spotify cli) is much faster, codex is my dj now.
https://t.co/K4WviRSXG3 If you wanna play YouTube to Sonos, check out
https://x.com/steipete/status/2053310800773685600

Audio: AI News Week Ending 05/15/2026

Share this:

Like this:

Leave a ReplyCancel reply

Trending

Discover more from Ethan B. Holland