We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “Is Gemini now better than Claude at Pokémon?” by Julian Bradshaw

“Is Gemini now better than Claude at Pokémon?” by Julian Bradshaw

2025/4/20
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

Background: With the release of Claude 3.7 Sonnet, Anthropic promoted a new benchmark: beating Pokémon. Now, Google claims Gemini 2.5 Pro has substantially surpassed Claude's progress on that benchmark.

TL:DR: We don't know if Gemini is better at Pokémon than Claude because their playthroughs can't be directly compared.

** The Metrics**

Here are Anthropic's and Google's charts:

[1]Unfortunately these are using different x and y axes, but it's roughly accurate to say that Gemini has made it nearly twice as far in the game[2] now:

And moreover, Gemini has gotten there using approximately 1/3rd the effort! As of writing, Gemini's current run is at ~68,000 actions, while Claude's current run is at ~215,000 actions.[3][4]

So, sounds definitive, right? Gemini blows Claude out of the water.

** The Agents' Harnesses**

Well, when Logan Kilpatrick (product lead for Google's AI studio) posted his tweet, he gave an important caveat:

"next best model only [...]


Outline:

(01:13) The Metrics

(02:19) The Agents Harnesses

(05:13) The Agents Masters

(07:46) The Agents Vibes

(11:27) Conclusion

The original text contained 14 footnotes which were omitted from this narration.


First published: April 19th, 2025

Source: https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-now-better-than-claude-at-pokemon)

    ---
    

Narrated by TYPE III AUDIO).


Images from the article: Line graph showing Claude AI models' progress in playing Pokémon game milestones.)Video game level map with pink checkerboard background and sprite characters.)Line graph: )Timeline graph: )Pixelated game map showing water bodies, paths, and vegetation markers.)A retro-style video game grid with coordinate markers and pixel characters.)This appears to be a gameplay screenshot from )Sundar Pichai tweets: ) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.