Claude’s Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at
https://x.com/AnthropicAI/status/2053881827396653207
You can now listen to me and Joe read out Claude’s constitution as an audiobook. Working on adding the option of listening to it on fast mode 🙂
https://x.com/AmandaAskell/status/2054010971765805486
Meta announced Muse Spark in Voice Mode and Meta Glasses
https://www.testingcatalog.com/meta-to-release-muse-spark-in-voice-mode-and-meta-glasses/
Today we’re introducing Meta AI Voice Conversations powered by Muse Spark that let you talk naturally to Meta AI (interrupt, switch topics, or swap languages), and as you talk, Meta AI can generate images and pull up recommendations from Reels, maps, and more. We’re also bringing
https://x.com/MetaNewsroom/status/2054205287515484397
we launched some muse spark updates yesterday, including muse spark voice and live AI w your camera in Meta AI app + muse spark rolling out to glasses 😎 check them out!
https://x.com/alexandr_wang/status/2054588354914832439
Seeing the demos come together over the last week has been awesome — so many things that previously required a special-purpose model (e.g. real-time translation, event detection in video) turn out to be zero-shot instruction following once you have a general-purpose model with
https://x.com/johnschulman2/status/2053940940885332028
Interaction Models: A Scalable Approach to Human-AI Collaboration – Thinking Machines Lab
https://thinkingmachines.ai/blog/interaction-models/
Interaction Models: A Scalable Approach to Human-AI Collaboration – Thinking Machines Lab
https://thinkingmachines.ai/blog/interaction-models/
People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.
https://x.com/thinkymachines/status/2053938892152435174
Sharing our work on full-duplex multimodal models — real-time interaction that’s natural and intuitive without compromising on intelligence. We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to
https://x.com/johnschulman2/status/2053940452789981426
thinking machines is using SGLang btw
https://x.com/eliebakouch/status/2053982248253190180
Thinking Machines know how to surprise. Those simultaneous abilities (not only translation but also creating graph while replying to a question) are pretty remarkable. Can’t wait to try it out and also learn how much it costs to use
https://x.com/TheTuringPost/status/2053975565179253010
Thinking Machines on X: “People talk, listen, watch, think, and collaborate at the same time, in real time. We’ve designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku https://t.co/uxl1InS6Ay” / X
https://x.com/thinkymachines/status/2053938892152435174
Thinky’s secret plan: 1: Increase Human<->AI bandwidth 2: Raise ceiling of human+AI intelligence 3: Help humans continue as main-characters in the new world We are at Step 1. Interaction Models are great real-time collaborative tools for humans. Here’s a preview:
https://x.com/soumithchintala/status/2053940215505645938
Very cool announcement from Thinky! The model looks nice (they go into some reasonable amount of detail), and reading some parts of the blog you can definitely see that the infea guys had a lot of fun there!
https://x.com/giffmana/status/2053953584300003405
GPT-Realtime-2 for instantly translating audio in realtime
https://x.com/gdb/status/2053134883040514350
gpt-realtime-2 is a great voice model (with a typically bad OpenAI name). Voice models are natively processing speech, not transcribing it, so the intelligence of the model matters. The old voice model was GPT-4o level, this is much smarter (how smart? OpenAI gave no benchmarks)
https://x.com/emollick/status/2053998691040583882
have been excited for realtime voice-to-voice translation as an AI application since we started OpenAI. extremely cool to see it now available in the API for anyone to build with:
https://x.com/gdb/status/2052480998668206262
people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)
https://x.com/sama/status/2052462271667028211
You can now just build amazing voice agents, with the GPT-Realtime-2 reasoning model in our API:
https://x.com/gdb/status/2052448850796011931
Announcing agentic performance benchmarking for Speech to Speech models on Artificial Analysis. We use 𝜏-Voice to measure tool calling and customer interaction voice agent capabilities in realistic customer service scenarios Even the strongest Speech to Speech (S2S) models
https://x.com/ArtificialAnlys/status/2054234919887573292
Every voice release since 2024 has acted like it’s finally building “”Her””. But where are we really, and what will it take to get there? @aiDotEngineer
https://x.com/neilzegh/status/2053945753073074484?s=20
Why Pipelines won’t build Voice AGI — Neil Zeghidour, CEO, Gradium AI – YouTube
Granola — The AI Notepad for back-to-back meetings
https://www.granola.ai/?via=adops-tldr-tech&dub_id=zrB2iDoskHcSooiw
Latest spogo (Spotify cli) is much faster, codex is my dj now.
https://t.co/K4WviRSXG3 If you wanna play YouTube to Sonos, check out
https://x.com/steipete/status/2053310800773685600





Leave a Reply