We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Notes on countermeasures for exploration hacking (aka sandbagging)” by ryan_greenblatt
14:41
Share
2025/3/25
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What Are the Countermeasures for Exploration Hacking?
How Does the Model Handle Generalization When It Messes Up?
What Empirical Evidence Exists for Exploration Hacking?
How Can We Detect Exploration Hacking or Sandbagging?
How Do Neural Architecture Changes Affect Exploration Hacking?
What Future Work Is Needed in This Area?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.