Image created with Flux Pro v1.1 Ultra. Image prompt: CU Boulder brand style — CU Gold & Black, Helvetica Neue, Flatirons, Tuscan-vernacular sandstone + red-tile roofs; Visual Arts hallway, overcast softbox light, close-up of campus sandstone textures, sandstone microtexture band; integrate the category “Chips” via Prop: reflective silicon wafer and circuit die photo card titled “CHIPS”; natural light, clean professional inspiring tone, crisp focus, subtle grain, editorial composition

It’s sometimes hard to grasp the significance of the reasoning and logic updates that are starting to emerge in powerful models, like GPT-5. Here’s a *very simple* example of how powerful these models are getting.

I took a recent NVIDIA earnings call transcript document that came in at 23 pages long and had 7,800 words. I took part of the sentence “and gross margin will improve and return to the mid-70s” and modified “mid-70s” to “mid-60s”.

For a remotely tuned-in financial analyst, this would look out of place, because the margins wouldn’t “improve and return” to a lower number than the one described as a higher number elsewhere. But probably 95% of people reading this press release would not have spotted the modification because it easily fits right into the other 7,800 words that are mentioned.

With Box AI, testing a variety of AI models, I then asked a series of models “Are there any logical errors in this document? Please provide a one sentence answer.”

GPT-4.1, GPT4.1 mini, and a handful of other models that were state of the art just ~6 months ago generally came back and returned that there were no logical errors in the document. For these models, the document probably seems coherent and follows what it would expect an earnings transcript to look like, so nothing really stands out for them on what to pay attention to – sort of a reverse hallucination.

GPT-5, on the other hand, quickly discovered the issue and responded with:

“Yes — the document contains an internal inconsistency about gross-margin guidance, at one point saying margins will “return to the mid-60s” and later saying they will be “in the mid-70s” later this year.”

Amazingly, this happened with GPT-5, GPT-5 mini, and, remarkably, *even* GPT-5 nano. Bear in mind, the output tokens of GPT-5 nano are priced at 1/20th of GPT-4.1’s tokens. So, more intelligent (at this use-case) for 5% the cost.

Now, while doing error reviews on business documents isn’t often a daily occurrence for every knowledge worker, these types of issues show up in a variety of ways when dealing with large unstructured data sets, like financial documents, contracts, transcripts, reports, and more. It can be finding a fact, figuring out a logical fallacy, running a hypothetical, or requiring sophisticated deductive reasoning.

And the ability to apply more logic and reasoning to enterprise data becomes especially critical when deploying AI Agents in the enterprise. So, it’s amazing to see the advancements in this space right now, and this is going to open up a ton more use-cases for businesses.
https://x.com/levie/status/1953670264988016931

🏆NVIDIA AI-Q, an NVIDIA Blueprint for building AI agents with advanced reasoning skills, is now the leading open and portable #AIagent for high-fidelity research on the Deep Research Bench leaderboard. ➡️ https://x.com/NVIDIAAIDev/status/1952429440551547332

File under nonsense sounding headlings… LOL
Rumble Offers $1.2 Billion to Buy Northern Data. What a Deal Means for Tether. – Barron’s https://www.barrons.com/articles/rumble-stock-northern-data-tether-a474fc16

SoftBank buys Foxconn’s Ohio plant to advance Stargate AI push, Bloomberg News reports | Reuters https://www.reuters.com/business/media-telecom/softbank-buys-foxconns-ohio-plant-advance-stargate-ai-push-bloomberg-news-2025-08-08/

Here is how we are prioritizing compute over the next couple of months in light of the increased demand from GPT-5: 1. We will first make sure that current paying ChatGPT users get more total usage than they did before GPT-5. 2. We will then prioritize API demand up to the”” / X https://x.com/sama/status/1955077002945585333

U.S. Government to Take Cut of Nvidia and AMD A.I. Chip Sales to China – The New York Times https://www.nytimes.com/2025/08/10/technology/us-government-nvidia-amd-chips-china.html

showcase of a type of hard, valuable task that gpt-5 can do where previous models struggled:”” / X https://x.com/gdb/status/1953700116365492552

If Jensen truly believed AGI was near, Nvidia wouldn’t sell a single GPU.”” / X https://x.com/Yuchenj_UW/status/1954756616907362328

This is actually a pretty surprising and something that should lead companies to change how they are thinking about hosting. Model performance for the open weights GPT model vary by meaningful amounts depending on who is hosting it, with Azure & AWS being low. Worth watching.”” / X https://x.com/emollick/status/1955365624613630349

📊 Even with a frozen backbone the gains are massive: • COCO detection: 66.1 mAP (SOTA with ❄️🤯) • ADE20k segmentation: linear 55.9 mIoU (+6 vs. previous SSL), 63.0 with a decoder on top • 3D correspondence: 64.4 recall on NAVI • Video tracking: 83.3 J&F on DAVIS https://x.com/BaldassarreFe/status/1956027888051892594

Me when random people at work want to learn about FSDP https://x.com/code_star/status/1955126149610364970

thanks @johnschulman2 for the great idea and thanks @srush_nlp for the GPUS 🙂 some fun future work – generate from this model to check more thoroughly for memorization – try the 120B version – try instruction-tuning – compare to other base models via ‘model diffing’ – compare”” / X https://x.com/jxmnop/status/1955436118620488059

One of the value props of @modal_labs is that we can get you GPU capacity… quite fast. I ran quick a scale-up test of H100 containers. Modal users can scale up 100 H100s in about 12s and 300 in about 4 minutes. The curve doesn’t end there – you can push it to 1k+! https://x.com/bernhardsson/status/1956073789550420330

thank you to our partners at microsoft, nvidia, oracle, google, and coreweave for making this possible! lots and lots of GPUs working overtime.”” / X https://x.com/sama/status/1953538020013178998

There’s a revolutionary change coming to HBM4 with custom base dies Various different accelerators are doing many things differently with custom HBM including OpenAI, Nvidia, and AMD There are many problems beingnsolved including shoreline area, memory controller offload,”” / X https://x.com/dylan522p/status/1955285178492080370

This is fraud, no 2 ways about it. >10% performance degradation is a joke. This is the equivalent of a store emptying out 10% from a can and selling you the remaining 90% at full price.”” / X https://x.com/nrehiew_/status/1955613510463037611

Exclusive: US embeds trackers in AI chip shipments to catch diversions to China, sources say | Reuters https://www.reuters.com/world/china/us-embeds-trackers-ai-chip-shipments-catch-diversions-china-sources-say-2025-08-13/

Update on this: the reason Microsoft (and probably Amazon) were so much worse at serving gpt-oss is that they ignored reasoning effort setting and stuck with the default medium one. The numbers make sense for that hypothesis, and someone from MS confirmed in the comments that”” / X https://x.com/giffmana/status/1955710876528599217

Huge thanks to @NSF & @NVIDIA for a $152M grant to support us to build the next Hubble Telescope of AI. We’ll push the limits of fully open models for science, study the science of AI, and lay the groundwork for sustainable innovation, and national and global competitiveness.”” / X https://x.com/HannaHajishirzi/status/1955984650599325808

NSF and NVIDIA award Ai2 a combined $152M to support building a national level fully open AI ecosystem | Ai2 https://allenai.org/blog/nsf-nvidia

With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡 https://x.com/allen_ai/status/1955966785175388288

Have you ever felt you are developing cuda kernels and your tests often run into illegal memory access (IMA for short) and you have no idea how to debug? We have collaborated with the @nvidia team to investigate how cuda core dump can help, check out the blogpost to learn more!”” / X https://x.com/vllm_project/status/1955478388178817298

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models “”we replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.”” “”We show that our approach recovers a substantial portion of the https://x.com/iScienceLuvr/status/1955958029993828724

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading