We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode [Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

2025/3/19
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.

Full paper | Github repo


First published: March 19th, 2025

Source: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks)

Linkpost URL:https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/)

  ---
    

Narrated by TYPE III AUDIO).


Images from the article: undefined) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.