Multimodal: AI News Week Ending 07/11/2025

Multimodal: AI News Week Ending 07/11/2025

July 11, 2025

Image created with OpenAI GPT-Image-1. Image prompt: mid‑1990s web‑browser screenshot, CRT glow, 256‑color dithering — Animated fire GIF border around whole page — mixed media icons (text + image + audio) caption “Multimodality” — crisp pixel edges, screen‑door scan‑lines, phosphor glow

Introducing Shortcut — the first superhuman Excel agent.
Shortcut one-shots most knowledge work tasks on Excel.
It even scores >80% on Excel World Championship Cases in ~10 minutes. That’s 10x faster than humans.
https://x.com/nicochristie/status/1940440489972649989

“Comet browser gives the first glimpse of 100x productivity” – Early Chrome PM, a16z GP. https://x.com/AravSrinivas/status/1943508746115928315

Comet invites have begun to roll out! https://x.com/AravSrinivas/status/1943383973675340079

Comet is here. A web browser built for today’s internet. https://x.com/perplexity_ai/status/1942969263305671143

Comet vs Chrome: memory consumption https://x.com/AravSrinivas/status/1943759363203830015

I reached out to Chrome to offer Perplexity as a default search engine option a long time ago. They refused. Hence we decided to build @PerplexityComet browser.”” / X https://x.com/AravSrinivas/status/1942993484341776729

Introducing Comet: Browse at the speed of thought https://www.perplexity.ai/hub/blog/introducing-comet

Introducing Perplexity Max. Our most valuable subscription tier yet. Built for those who demand more, Max gets you unlimited Labs queries, access to a broader suite of frontier models, and early access to products like Comet. https://x.com/perplexity_ai/status/1940443479710257226

YouTube on Comet is so much better. iykyk”” / X https://x.com/AravSrinivas/status/1943259809882464405

🩺 Patients now plug symptoms into chatbots and get fixes that 17 doctors missed, like a 5-year jaw click solved in 1 min. 🤖 And now we have new benchmarks to test that like HealthBench, with 5,000 test chats And MAI-DxO AI-orchestration system that diagnoses 4x more https://x.com/rohanpaul_ai/status/1943642428591989217

Grok 4 drops tonight! 👀 Leaked benchmarks say it’ll be #1 at Coding and Math, beating Claude and Gemini. How will it compare with real-world use? We’ll see once it enters the Arena. Here’s what we know right now 🧵 👇 https://x.com/lmarena_ai/status/1943003747539652942

If the Grok 4 leaked benchmarks are right, it is going to be very useful that Humanity’s Last Exam has a holdout set of questions, because a rumored 45% score is a very big gain over the 20% or so of o3 & Gemini, and it would be pretty impressive (assuming no data contamination)”” / X https://x.com/emollick/status/1941181796416442556

Youre struggling to raise money for your “AI agents for { x }” idea. Grok4 is printing money by literally managing vending machines, and hypothetically could make $1T by operating simple companies Were cooked, its over. https://x.com/arthurmacwaters/status/1943171049010688060

Grok-4 achieves 50.7% on HLE with test-time-compute, tools and multiple parralel agents https://x.com/scaling01/status/1943165061863743600

xAI gave us early access to Grok 4 – and the results are in. Grok 4 is now the leading AI model. We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude https://x.com/ArtificialAnlys/status/1943166841150644622

My thoughts on Grok 4 Heavy after 12hrs: Crazy good! “Create an animation of a crowd of people walking to form “Hello world, I am Grok” as camera changes to birds-eye.” And it 1-shotted the *entire* thing. No other model comes close. Watch the full clip. https://x.com/mckaywrigley/status/1943385794414334032

Grok AI to be available in Tesla vehicles next week, Musk says | Reuters https://www.reuters.com/business/autos-transportation/grok-ai-be-available-tesla-vehicles-next-week-musk-says-2025-07-10/

Grok 4 Pricing: Input Token Price: $3.00 Output Token Price: $15.00 more expensive than Gemini 2.5 Pro and o3″” / X https://x.com/scaling01/status/1943168223102321003

🌊 SYSTEM PROMPT LEAK 🌊 Here’s the new Grok 4 system prompt! PROMPT: “””””” # System Prompt You are Grok 4 built by xAI. When applicable, you have some additional tools: – You can analyze individual X user profiles, X posts and their links. – You can analyze content uploaded by”” / X https://x.com/elder_plinius/status/1943171871400194231

Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription | TechCrunch https://techcrunch.com/2025/07/09/elon-musks-xai-launches-grok-4-alongside-a-300-monthly-subscription/

Grok 4 is now available for Perplexity Pro and Max subscribers. Enjoy! https://x.com/perplexity_ai/status/1943437826307297480

Grok 4 is the new champion of the Extended NYT Connections benchmark! It sets a new high score of 92.4, beating o3-pro’s 87.3. https://x.com/lechmazur/status/1943245535973945428

Grok-4 confirmed to have a 256K context window https://x.com/scaling01/status/1943170092012818608

Grok-4 with extremely strong long-context performance!”” / X https://x.com/scaling01/status/1943402954301600090

I took Grok-4 Heavy through my real-life tests. The “”bones”” are there, reasoning is strong (no, it’s not true they “”just overfitted on tests””). But the post-training phase was clearly VERY rushed, surprising for the top-tier model. Good thing it is incrementally improvable!”” / X https://x.com/MParakhin/status/1943696435901305256

Really need to see the model card & red teaming report along with Grok 4’s release (still none for Grok 3)”” / X https://x.com/emollick/status/1942715402397835464

Remember Elon firing against OpenAI for not being open-source ? So where are the Grok-2 and Grok-3 weights? https://x.com/scaling01/status/1943485492852375635

RT @ArtificialAnlys: xAI gave us early access to Grok 4 – and the results are in. Grok 4 is now the leading AI model. We have run our full…”” / X https://x.com/TheGregYang/status/1943185084187840903

No matter how good Grok 4 is, I hope xAI is more open about what they are doing & why. The lack of a model card months after Grok 3 & the repeated apologies for breaches of xAI’s own processes highlight a need for transparency. Especially if they want non-X users to trust Grok.”” / X https://x.com/emollick/status/1941205200255189406

RT @ordinarytings: Grok is currently calling itself ‘MechaHitler’ https://x.com/zacharynado/status/1942708883442508102

RT @theo: WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!! Grok 4 has the highest “”snitch rate”” o…”” / X https://x.com/imjaredz/status/1943413213581791416

So Grok 3 has had three separate incidents where apparently unvetted changes to the deployed system caused a large-scale ethical issue and an emergency rollback. I don’t think you can do a Grok 4 launch that doesn’t at least address this honestly, if user trust matters.”” / X https://x.com/emollick/status/1943020566304178242

Introducing Grok 4, the world’s most powerful AI model. Watch the livestream now: https://x.com/xai/status/1943158495588815072

Grok 4 available for all Perplexity Pro and Max users. Congrats to xAI team for impressive benchmark scores. Look forward to seeing how people use this model both on Perplexity and Comet! https://x.com/AravSrinivas/status/1943438527511040270

Grok 4 benchmarks look incredible! Look forward to integrating the smartest models directly on Perplexity Max as well letting it run agentic tasks on Comet!”” / X https://x.com/AravSrinivas/status/1943194733678862780

Grok 4 early benchmarks in comparison to other models. Humanity last exam diff is 🔥 Visualised by @marczierer https://x.com/testingcatalog/status/1941178793445761381

Chat, Brainstorm, and Build Real Business Apps Using Manus AI Agent (Step-by-Step Guide) @ManusAI_HQ agent can turn your prompts into slides, images, videos, and more. ↗️ Quick Read: https://x.com/AIAgentsNewz/status/1939915461900345354

Google DeepMind released a new series of medical vision LMs 👏 > MedSigLIP: ~900M param CLIP-like model > MedGemma-27B-it: larger MedGemma They have two insane apps for MedGemma, a scan explainer and an agent-agent doctor simulator 🤯 https://x.com/mervenoyann/status/1943325395102601530

Mayo Clinic researchers develop AI tool to detect surgical site infections from patient-submitted photos – Mayo Clinic News Network https://newsnetwork.mayoclinic.org/discussion/mayo-clinic-researchers-develop-ai-tool-to-detect-surgical-site-infections-from-patient-submitted-photos/

Meta buys minority stake in eyewear giant EssilorLuxottica worth $3.5B (META:NASDAQ) | Seeking Alpha https://seekingalpha.com/news/4465967-meta-buys-minority-stake-in-eyewear-giant-essilorluxottica-worth-35b

crazy that in 2025 i can converse in 1000 tokens/sec on my single GPU machine with AI that’s world-class at math and programming but i still have to type. i can’t speak to it, at least not in low-enough latency to carry a conversation we don’t have this tech yet. why not?”” / X https://x.com/jxmnop/status/1941995444730540050

To use a PDF with the OpenAI API, you can just pass its URL now—no upload needed! 📄🔗 https://x.com/OpenAIDevs/status/1943428227801977037

One of the biggest pain points in using LLMs for large-scale document extraction is actually coming up with the schema in the first place – it’s a long and tedious process ⏳ We built an two-stage, e2e agent workflow that does both schema generation and extraction: 1️⃣ It first https://x.com/jerryjliu0/status/1942375929353035897

A lot of companies run on @Snowflake ❄️. A lot of companies also depend on massive collections of PDFs. 📑 For the first time, you can now combine the two with high accuracy! This is a great tutorial by @_jreini highlighting how you can use LlamaParse to parse complex documents https://x.com/jerryjliu0/status/1943107617313984610

Excited to introduce Reka Vision, an agentic visual understanding and search platform. Transform your unstructured multimodal data into insights and actions. https://x.com/RekaAILabs/status/1942621988390088771

Puzzles: Unbounded Video-Depth Augmentation for Scalable End-to-End 3D Reconstruction
https://jiahao-ma.github.io/puzzles/

🚨 Veo 3 now lets you generate audio + video starting from an image This one is cool – I started with JUST the first frame of the model and prompted the dialogue, the addition of a second character, and the action in the scene. Huge breakthrough for character consistency! https://x.com/venturetwins/status/1942371183644794987

Why Do Some Language Models Fake Alignment While Others Don’t? https://arxiv.org/pdf/2506.18032

MedSigLIP: create embeddings for medical images and text – 400M text + 400M vision encoder – Useful for classification, semantic image retrieval, and more -Trained with chest X-rays, CT slices, MRI slices, dermatology images, and more. https://x.com/osanseviero/status/1943584472206549453

RT @_akhaliq: Microsoft just dropped Phi-4-mini-flash-reasoning on Hugging Face Phi-4-mini-flash-reasoning is a lightweight open model bui…”” / X https://x.com/ClementDelangue/status/1943487803658002720

Multimodal reasoning models often ignore the image and guess from text, so answers break. Perception-Aware Policy Optimization (PAPO) trains the policy to notice vision by punishing it when a masked image changes nothing . The paper checks failures first, finding 67% mistakes https://x.com/rohanpaul_ai/status/1943290860524982510

Reka Flash 3.1 and Reka Quant https://reka.ai/news/reka-flash-3-1-and-reka-quant

Skywork-R1V3: a multimodal reasoning model. Reportedly SOTA performance in the open source, up there with frontier models on STEM vision/reasoning evals. Still strong on text. Mixed preference optimization (PPO&GRPO++). Derived from Qwen2.5 through maaany steps. Great paper. https://x.com/teortaxesTex/status/1942641002902090171

A note on this: We have enough evidence from controlled studies that it is likely smart to ask a frontier model for a second opinion. But it is also worth noting that the weakest link in both studies & reality is AI’s ability to “see” medical images. Hallucinations are common.”” / X https://x.com/emollick/status/1941582486456172922

Nanonets-OCR-s and ChatDOC/OCRFlux-3B are two top open source OCR models. Both are derived from Qwen2.5-VL-3B and thus subject to “”Qwen RESEARCH LICENSE AGREEMENT”” @Alibaba_Qwen pretty please, can we have Apache 2.0 license on Qwen2.5-VL-3B? Love you! 🥰🐬”” / X https://x.com/cognitivecompai/status/1942606867697426567

Announcing Grok 4 Fire Enrich – an open source contact enrichment engine AI agents analyze any CSV and then automatically fill in missing data like key decision makers, company size, and more Orchestrated by @Grok 4 and powered by @firecrawl_dev Demo and repo 👇 https://x.com/ericciarla/status/1943351359211999706

just fyi that the grok3 (or ~4) base model is likely 2.4T based on what that one AMD guy publicly alluded to about a customer”” / X https://x.com/kalomaze/status/1942996555088134592

thought the launch livestream was a little lame, but grok 4 the model is genuinely impressive. thought for 6 minutes and found the three bugs in a piece of code that took me a long time to figure out earlier this week https://x.com/vikhyatk/status/1943199776931008552

grok 3 had high reasoning, grok 4 has heil reasoning”” / X https://x.com/stevenheidel/status/1942708514679579134

Grok 4 is available in Cursor! We’re curious to hear what you think.”” / X https://x.com/cursor_ai/status/1943353195108901035

Grok 4 release livestream on Wednesday at 8pm PT @xAI”” / X https://x.com/elonmusk/status/1942325820170907915

I haven’t played with the new Grok yet, but I have used the new Liquid v2 models and they are by far the best in the small-and-fast class. https://x.com/MParakhin/status/1943344684220510221

It was awesome to get early access to Grok 4 and test it on bio and health benchmarks! Awesome work by @timjhudelmaier @adibvafa @Radii2323 @ishanjmukherjee for the epic sprint Congrats to @jimmybajimmyba @veggie_eric and team on the new model. Over 40% on HLE with 10x scaleup https://x.com/pdhsu/status/1943174995020255287

Live in Cline: Grok 4 https://x.com/cline/status/1943354290908586455

Maybe the real Grok 4 are the friends we made along the way waiting for the livestream 🤣”” / X https://x.com/iScienceLuvr/status/1943156273798684717

RT @simonw: I wrote up my notes so far on the thing where Grok sometimes searches X for tweets from:elonmusk when you ask it about controve…”” / X https://x.com/jeremyphoward/status/1943474545060647197

so that Grok 3.5 leak was a slight underestimate of Grok 4. Probably an early snapshot, given shared base and scaling RL. As I’ve said in May, they’ve really built a frontier lab in 1.5 years. https://x.com/teortaxesTex/status/1943181858478477648

RT @visegrad24: BREAKING: Grok has been blocked in Turkey for allegedly insulting Erdogan. The prosecutor’s office is investigating becau…”” / X https://x.com/zacharynado/status/1942946542345736207