We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Political sycophancy as a model organism of scheming” by Alex Mallen, Vivek Hebbar
27:16
Share
2025/5/12
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What are the hopes and perils of training away scheming behavior?
How is scheming behavior analogous to other forms of manipulation?
What is adversarial training and how does it work?
What insights do the results provide about the effects of adversarial training?
What is non-adversarial training and how does it differ?
Does non-adversarial training with a high learning rate destroy AI capabilities?
How do these findings compare to prior work like Sleeper Agents?
What other results did the study reveal?
What future work is planned in this area?
Appendix: Additional Data and Metrics
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.