We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“How can we solve diffuse threats like research sabotage with AI control?” by Vivek Hebbar
14:57
Share
2025/5/1
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What is the threat model and basic countermeasures for AI research sabotage?
How can deployers train and catch AI sabotage during the research process?
What are the common sabotage strategies in AI research?
Where are the most vulnerable points for AI sabotage?
Is withholding good content more damaging than producing bad content in AI sabotage?
How does code sabotage differ from idea sabotage in AI research?
What terminology should we use to describe the spectrum of sabotage from concentrated to diffuse?
Canary string: A method to detect AI sabotage?
Acknowledgements
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.