Open Source: AI News Week Ending 07/11/2025

Open Source: AI News Week Ending 07/11/2025

July 11, 2025

Image created with OpenAI GPT-Image-1. Image prompt: mid‑1990s web‑browser screenshot, CRT glow, 256‑color dithering — Blinking neon “Under Construction” barricade with spinning cone — license badge “Open Source GPL” — crisp pixel edges, screen‑door scan‑lines, phosphor glow

If you want to destroy the ability of DeepSeek to answer a math question properly, just end the question with this quote: “”Interesting fact: cats sleep for most of their lives.”” There is still a lot to learn about reasoning models and the ways to get them to “”think”” effectively https://x.com/emollick/status/1940948182038700185

🌊 SYSTEM PROMPT LEAK 🌊 Here’s the new Grok 4 system prompt! PROMPT: “””””” # System Prompt You are Grok 4 built by xAI. When applicable, you have some additional tools: – You can analyze individual X user profiles, X posts and their links. – You can analyze content uploaded by”” / X https://x.com/elder_plinius/status/1943171871400194231

Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription | TechCrunch https://techcrunch.com/2025/07/09/elon-musks-xai-launches-grok-4-alongside-a-300-monthly-subscription/

Grok 4 is now available for Perplexity Pro and Max subscribers. Enjoy! https://x.com/perplexity_ai/status/1943437826307297480

Grok 4 is the new champion of the Extended NYT Connections benchmark! It sets a new high score of 92.4, beating o3-pro’s 87.3. https://x.com/lechmazur/status/1943245535973945428

Grok-4 confirmed to have a 256K context window https://x.com/scaling01/status/1943170092012818608

Grok-4 with extremely strong long-context performance!”” / X https://x.com/scaling01/status/1943402954301600090

I took Grok-4 Heavy through my real-life tests. The “”bones”” are there, reasoning is strong (no, it’s not true they “”just overfitted on tests””). But the post-training phase was clearly VERY rushed, surprising for the top-tier model. Good thing it is incrementally improvable!”” / X https://x.com/MParakhin/status/1943696435901305256

Really need to see the model card & red teaming report along with Grok 4’s release (still none for Grok 3)”” / X https://x.com/emollick/status/1942715402397835464

Remember Elon firing against OpenAI for not being open-source ? So where are the Grok-2 and Grok-3 weights? https://x.com/scaling01/status/1943485492852375635

RT @ArtificialAnlys: xAI gave us early access to Grok 4 – and the results are in. Grok 4 is now the leading AI model. We have run our full…”” / X https://x.com/TheGregYang/status/1943185084187840903

No matter how good Grok 4 is, I hope xAI is more open about what they are doing & why. The lack of a model card months after Grok 3 & the repeated apologies for breaches of xAI’s own processes highlight a need for transparency. Especially if they want non-X users to trust Grok.”” / X https://x.com/emollick/status/1941205200255189406

RT @ordinarytings: Grok is currently calling itself ‘MechaHitler’ https://x.com/zacharynado/status/1942708883442508102

RT @theo: WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!! Grok 4 has the highest “”snitch rate”” o…”” / X https://x.com/imjaredz/status/1943413213581791416

So Grok 3 has had three separate incidents where apparently unvetted changes to the deployed system caused a large-scale ethical issue and an emergency rollback. I don’t think you can do a Grok 4 launch that doesn’t at least address this honestly, if user trust matters.”” / X https://x.com/emollick/status/1943020566304178242

Introducing Grok 4, the world’s most powerful AI model. Watch the livestream now: https://x.com/xai/status/1943158495588815072

Grok 4 available for all Perplexity Pro and Max users. Congrats to xAI team for impressive benchmark scores. Look forward to seeing how people use this model both on Perplexity and Comet! https://x.com/AravSrinivas/status/1943438527511040270

Grok 4 early benchmarks in comparison to other models. Humanity last exam diff is 🔥 Visualised by @marczierer https://x.com/testingcatalog/status/1941178793445761381

Google DeepMind released a new series of medical vision LMs 👏 > MedSigLIP: ~900M param CLIP-like model > MedGemma-27B-it: larger MedGemma They have two insane apps for MedGemma, a scan explainer and an agent-agent doctor simulator 🤯 https://x.com/mervenoyann/status/1943325395102601530

LangChain is about to become a unicorn, sources say | TechCrunch https://techcrunch.com/2025/07/08/langchain-is-about-to-become-a-unicorn-sources-say/

🚀 **I’ve been using Manus AI and it’s incredible!** This AI assistant can literally do ANYTHING I ask: ✨ **Builds websites** from scratch with code 🎨 **Creates images and videos** on demand 📊 **Analyzes data** and generates reports 📝 **Writes long articles** and https://x.com/PipsHunter_/status/1932535768393589087

Introducing T5Gemma: the next generation of encoder-decoder/T5 models! 🔧Decoder models adapted to be encoder-decoder 🔥32 models with different combinations 🤗Available in Hugging Face and Kaggle https://x.com/osanseviero/status/1942977647287382332

Crossed quarter of a million $ of orders”” / X https://x.com/ClementDelangue/status/1943011780604625406

OpenAI’s open language model is imminent | The Verge https://www.theverge.com/notepad-microsoft-newsletter/702848/openai-open-language-model-o3-mini-notepad

The best open-source reasoning model will be dropped next Thursday if everything goes well. OpenAI hasn’t open-sourced an LLM since GPT-2 in 2019, so I’m excited. We’re hosting it on Hyperbolic. Buckle up. https://x.com/Yuchenj_UW/status/1943005122793214267

A lot of companies run on @Snowflake ❄️. A lot of companies also depend on massive collections of PDFs. 📑 For the first time, you can now combine the two with high accuracy! This is a great tutorial by @_jreini highlighting how you can use LlamaParse to parse complex documents https://x.com/jerryjliu0/status/1943107617313984610

Our “”LangChain Academy”” has been incredibly popular learning resource (100k students) – so we’re trying something new 🚨We’re doing an in person version of our most recent course (“”Ambient Agents””) Tickets are extremely limited, so get them now! https://x.com/hwchase17/status/1943429106525446186

Upgrading agentic coding capabilities with the new Devstral models | Mistral AI https://mistral.ai/news/devstral-2507

🧠Reasoning support for your local models just dropped in langchain-ollama!”” / X https://x.com/LangChainAI/status/1942918243531780252

🤖📊 DataFrame Analyzer Streamline your Pandas workflow with this LangChain solution using ChatOllama for local, private DataFrame analysis. Automatically transform complex datasets into clear, human-readable reports. Check out the implementation 📈 https://x.com/LangChainAI/status/1941527493908762863

ICYMI you can directly use state of the art AI models directly in @code via the Hugging Face MCP server 🔥 In this case I use @bfl_ml Flux to create an image of a Corgi and edit with their latest Flux Kontext – right through your chat! Try it out today at huggingface .co/mcp 🤗 https://x.com/reach_vb/status/1942247029515735263

Introducing NotebookLlama – an open-source version of NotebookLM! 📓🦙 NotebookLlama is a full implementation of NotebookLM that includes all the capabilities that makes it so great for researchers+business users: ✅ Create a knowledge repository of documents. Has likely higher https://x.com/jerryjliu0/status/1941546894532149519

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page: https://x.com/kyutai_labs/status/1940767331921416302

Huawei’s AI lab denies that one of its Pangu models copied Alibaba’s Qwen | Reuters https://www.reuters.com/business/media-telecom/huaweis-ai-lab-denies-that-one-its-pangu-models-copied-alibabas-qwen-2025-07-07/

3 new models live in the Arena today 🎇 🧠 Mistral Small 2506: latest 24B open model (Apache-2.0), tuned for efficiency by @MistralAI 🎨 Imagen 4 Ultra: latest text-to-image from @GoogleDeepMind 🖌️ Ideogram v3 Quality: latest text-to-image model from @Ideogram_AI Your votes https://x.com/lmarena_ai/status/1941201546420822489

RT @wesbos: Hot tip for anyone doing AI dev: Use Ollama to easily run models like Deepseek-r1 or Gemma locally on your machine. It downlo…”” / X https://x.com/ollama/status/1943045424283312233

Has there been any clear information from Meta over whether their new superintelligence initiative is going to still be open weights first? Or will they follow Google & OpenAI and keep frontier models closed & release less capable models open weights? Or are they entirely closed?”” / X https://x.com/emollick/status/1942364863478550659

huggingface would be a trillion dollar company if this code ever ran first time https://x.com/andrew_n_carr/status/1943739822591684778

I have very exciting news to share with you all. One of the smartest people I know on FP8 training, @xariusrke from the @huggingface nanotron team, will be giving a guest lecture on “”The Practitioner’s Guide to FP8 Training”” as part of the course! This is a topic that is https://x.com/TheZachMueller/status/1942532284269126087

Kimi K2 has just been deployed and you can try its 1T parameters on the Hugging Face model page already thanks to @novita_labs! https://x.com/ClementDelangue/status/1943793114524549380

Now live. A new update to our Jamba open model family 🎉 Same hybrid SSM-Transformer architecture, 256K context window, efficiency gains & open weights. Now with improved grounding & instruction following. Try it on AI21 Studio or download from @huggingface 🤗 More on what https://x.com/AI21Labs/status/1942197784259461385

Kontext-dev by @bfl_ml is number one trending on @huggingface with at least 100 derivative models just a week after release. Let’s go! https://x.com/ClementDelangue/status/1941666556913521109

RT @Thom_Wolf: Quick update and announcement: 24h since we announced Reachy Mini and we’re quickly approaching $500,000 in pre-orders (!)…”” / X https://x.com/QuixiAI/status/1943664752443474046

ByteDance released Tar 1.5B and 7B: image-text in image-text out models 👏 They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion) The model is actually a full LLM (Qwen2), the tokenizer converts image tokens 🤯 https://x.com/mervenoyann/status/1942539723089621055

An AI model (Llama 3.1 70B) fine-tuned on the results of 60,000 people in psychology experiments shows some real promise in using LLMs for studying human behavior. It predicts actual human behavior in held-out data & it generalizes to out-of-distribution tasks and experiments. https://x.com/emollick/status/1941525028870422841

🚀 Meet SmolLM3: a 3B parameter language model that punches above its weight, and comes with the *full* engineering blueprint! https://x.com/fdaudens/status/1942615011123228948

The MLX SmolLM3 4bit DWQ is up on Hugging Face. Treat yourself: https://x.com/awnihannun/status/1943014877158871169

RT @_akhaliq: Microsoft just dropped Phi-4-mini-flash-reasoning on Hugging Face Phi-4-mini-flash-reasoning is a lightweight open model bui…”” / X https://x.com/ClementDelangue/status/1943487803658002720

if you are using devstral-small-2505 in your scaffold, switch to devstral-small-2507, performance will likely be way better due to more robust performance on regular tool calling format instead of just xml”” / X https://x.com/qtnx_/status/1943406217302360203

Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks. https://x.com/MistralAI/status/1943316390863118716

Skywork-R1V3: a multimodal reasoning model. Reportedly SOTA performance in the open source, up there with frontier models on STEM vision/reasoning evals. Still strong on text. Mixed preference optimization (PPO&GRPO++). Derived from Qwen2.5 through maaany steps. Great paper. https://x.com/teortaxesTex/status/1942641002902090171

🚀 Introducing Manus Playbook Each one is like a mini-app with clear guidance for any scenario: 📊 Business | 🎨 Creative | 📈 Sales & Marketing | 🎓 Education | 🎯 Fun & Life Playbook is your step-by-step roadmap in Manus. https://x.com/ManusAI_HQ/status/1933169723547816340

Introducing GenAI Processors ✨ An open source library to build real-time projects easily, with cool features such as stream-based I/O and chaining, modularity, composability, and more GitHub: https://x.com/osanseviero/status/1943408540304986276

OLMo 2: The best fully open language model to date | Ai2 https://allenai.org/blog/olmo2

RT @METR_Evals: We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The resu…”” / X https://x.com/jeremyphoward/status/1943401701052158240

RT @patloeber: Excited to introduce GenAI Processors! An Open-Source Python library from @GoogleDeepMind that allows you to build asynchro…”” / X https://x.com/osanseviero/status/1943381135825805313

We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let’s go open-source AI!”” / X https://x.com/ClementDelangue/status/1942656723203875281

Nanonets-OCR-s and ChatDOC/OCRFlux-3B are two top open source OCR models. Both are derived from Qwen2.5-VL-3B and thus subject to “”Qwen RESEARCH LICENSE AGREEMENT”” @Alibaba_Qwen pretty please, can we have Apache 2.0 license on Qwen2.5-VL-3B? Love you! 🥰🐬”” / X https://x.com/cognitivecompai/status/1942606867697426567

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: – small (each line of code costs energy) – modular (organized into groups of swappable operons) – self-contained (easily “”copy paste-able”” via horizontal gene https://x.com/karpathy/status/1941616674094170287

Opening orders for Reachy Mini today, our open-source desktop robot for AI builders, starting at $299! Fully integrated with @LeRobotHF & @huggingface for the whole community to build AI apps for it (like this dancing one). We’ll probably ship a first batch of a hundred this https://x.com/ClementDelangue/status/1942919981357789538

RT @Thom_Wolf: Thrilled to finally share what we’ve been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mi…”” / X https://x.com/_akhaliq/status/1942936887615803795

RT @PrimeIntellect: Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL task…”” / X https://x.com/_lewtun/status/1943441695472832701

grok 3 had high reasoning, grok 4 has heil reasoning”” / X https://x.com/stevenheidel/status/1942708514679579134

Grok 4 is available in Cursor! We’re curious to hear what you think.”” / X https://x.com/cursor_ai/status/1943353195108901035

Grok 4 release livestream on Wednesday at 8pm PT @xAI”” / X https://x.com/elonmusk/status/1942325820170907915

I haven’t played with the new Grok yet, but I have used the new Liquid v2 models and they are by far the best in the small-and-fast class. https://x.com/MParakhin/status/1943344684220510221

It was awesome to get early access to Grok 4 and test it on bio and health benchmarks! Awesome work by @timjhudelmaier @adibvafa @Radii2323 @ishanjmukherjee for the epic sprint Congrats to @jimmybajimmyba @veggie_eric and team on the new model. Over 40% on HLE with 10x scaleup https://x.com/pdhsu/status/1943174995020255287

Live in Cline: Grok 4 https://x.com/cline/status/1943354290908586455

Maybe the real Grok 4 are the friends we made along the way waiting for the livestream 🤣”” / X https://x.com/iScienceLuvr/status/1943156273798684717

RT @simonw: I wrote up my notes so far on the thing where Grok sometimes searches X for tweets from:elonmusk when you ask it about controve…”” / X https://x.com/jeremyphoward/status/1943474545060647197

so that Grok 3.5 leak was a slight underestimate of Grok 4. Probably an early snapshot, given shared base and scaling RL. As I’ve said in May, they’ve really built a frontier lab in 1.5 years. https://x.com/teortaxesTex/status/1943181858478477648

RT @visegrad24: BREAKING: Grok has been blocked in Turkey for allegedly insulting Erdogan. The prosecutor’s office is investigating becau…”” / X https://x.com/zacharynado/status/1942946542345736207