Image created with gemini-2.5-flash-image with claude-sonnet-4-5. Image prompt: Minimalist luxury stable interior with empty stall, golden nameplate reading LLAMA on polished brass door, pristine marble floors, single dramatic spotlight, cold blue moonlight through windows, untouched hay and water, architectural emptiness, cinematic composition with deep shadows and negative space, bold white sans-serif text LLAMA overlaid across image
At this point, papers testing whether AI can or cannot do something should try to test the strongest case, as well as a default. It is fine to say Llama 2 failed, but did a serious attempt to use GPT-5.1 Thinking in an agentic harness work? It would help better map the frontier.”” / X https://x.com/emollick/status/1994913383871586563





Leave a Reply