We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “Notes on the Long Tasks METR paper, from a HCAST task contributor” by abstractapplic

“Notes on the Long Tasks METR paper, from a HCAST task contributor” by abstractapplic

2025/5/5
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

I contributed one (1) task to HCAST, which was used in METR's Long Tasks paper. This gave me some thoughts I feel moved to share.

** Regarding Baselines and Estimates**

METR's tasks have two sources for how long they take humans: most of those used in the paper were Baselined using playtesters under persistent scrutiny, and some were Estimated by METR.

I don’t quite trust the Baselines. Baseliners were allowed/incentivized to drop tasks they weren’t making progress with, and were – mostly, effectively, there's some nuance here I’m ignoring – cut off at the eight-hour mark; Baseline times were found by averaging time taken for successful runs; this suggests Baseline estimates will be biased to be at least slightly too low, especially for more difficult tasks.[1]

I really, really don’t trust the Estimates[2]. My task was never successfully Baselined, so METR's main source for how long it would take – [...]


Outline:

(00:22) Regarding Baselines and Estimates

(02:23) Regarding Task Privacy

(04:00) In Conclusion

The original text contained 9 footnotes which were omitted from this narration.


First published: May 4th, 2025

Source: https://www.lesswrong.com/posts/5CGNxadG3JRbGfGfg/notes-on-the-long-tasks-metr-paper-from-a-hcast-task)

    ---
    

Narrated by TYPE III AUDIO).