We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode [Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo

[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo

2025/6/9
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

This is a link post. METR just made a lovely post detailing many examples they've found of reward hacks by frontier models. Unlike the reward hacks of yesteryear, these models are smart enough to know that what they are doing is deceptive and not what the company wanted them to do.


First published: June 9th, 2025

Source: https://www.lesswrong.com/posts/Zu4ai9GFpwezyfB2K/metr-recent-frontier-models-are-reward-hacking)

Linkpost URL:https://metr.org/blog/2025-06-05-recent-reward-hacking/)

  ---
    

Narrated by TYPE III AUDIO).