“@nearcyan I think this is a pretty good analysis, the one part you don’t mention but further added to this, is the almost simultaneous Gemini flash thinking model that’s equally good (if not better) than r1, and about as cheap, but 1) American bigtech 2) not Chinese 3) not open weights” / X
https://x.com/giffmana/status/1884689840278548644
“Nice way to test when AI can replace human evaluators & judges. The test compares if LLMs align better with group consensus than individual human evaluators do. GPT-4 & Gemini pass the test in 8/10 tasks, but struggle with deep contextual understanding. Few-shot learning helps.
https://x.com/emollick/status/1883632790811586947
“my elo-price pareto frontier chart went semi viral recently, but here’s a nice qualitative eval: TLDR: Gemini 2.0 Flash is not only more efficient at complex abstractive reporting than O1 (previous SOTA), it does so while being 200x cheaper*! — BEGIN CONTEXT — we now run
https://x.com/swyx/status/1884775744917967191
The Gemini app is now powered by Gemini 2.0 Flash.
https://blog.google/feed/gemini-app-model-update-january-2025/




