We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?” by Alex Mallen, charlie_griffin, Buck Shlegeris
18:46
Share
2025/3/26
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What is the purpose of an AI control protocol?
Overview of the Paper
Summary of Key Results
Qualitative Analysis of the Findings
How do these results impact our understanding of AI risks?
Using the Evaluation with More Capable Models
Challenges in Achieving Conservative Measurement
Strategies for Reducing Noise in the Evaluation
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.