We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Buck, Julian Stastny
28:53
Share
2025/5/8
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What Problems Can Sandbagging Cause?
How Does Training Impact Sandbagging?
Can High-Quality Data Eliminate Sandbagging?
On-Policy Data: A Solution to Low-Quality Off-Policy Data?
Exploration Hacking: A New Threat?
Other Countermeasures: What Are They?
Conclusion and Prognosis
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.