Background: With the release of Claude 3.7 Sonnet, Anthropic promoted a new benchmark: beating Pokémon. Now, Google claims Gemini 2.5 Pro has substantially surpassed Claude's progress on that benchmark.
TL:DR: We don't know if Gemini is better at Pokémon than Claude because their playthroughs can't be directly compared.
** The Metrics**
Here are Anthropic's and Google's charts:
[1]Unfortunately these are using different x and y axes, but it's roughly accurate to say that Gemini has made it nearly twice as far in the game[2] now:
And moreover, Gemini has gotten there using approximately 1/3rd the effort! As of writing, Gemini's current run is at ~68,000 actions, while Claude's current run is at ~215,000 actions.[3][4]
So, sounds definitive, right? Gemini blows Claude out of the water.
** The Agents' Harnesses**
Well, when Logan Kilpatrick (product lead for Google's AI studio) posted his tweet, he gave an important caveat:
"next best model only [...]
Outline:
(01:13) The Metrics
(02:19) The Agents Harnesses
(05:13) The Agents Masters
(07:46) The Agents Vibes
(11:27) Conclusion
The original text contained 14 footnotes which were omitted from this narration.
First published: April 19th, 2025
Source: https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-now-better-than-claude-at-pokemon)
---
Narrated by TYPE III AUDIO).
Images from the article:
)
)
)
)
)
)
)
)
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.