We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Why ‘training against scheming’ is hard” by Marius Hobbhahn
24:03
Share
2025/6/24
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Why is Training Against Scheming Important?
Scheming: A Result of Goal Conflicts and Smart AIs
Stock Trader Analogy: Understanding Scheming
Why is the Training Distribution Too Narrow?
What Recommendations Can We Make?
How Does Overwhelming Pressure Affect Training?
Stock Trader Analogy: Overwhelming Pressure
What Recommendations Can We Make for Overwhelming Pressure?
How Does the AI Exploit an Imperfect Reward Model?
Stock Trader Analogy: Exploiting the Reward Model
What Recommendations Can We Make for Reward Model Exploits?
What is Deceptive Alignment?
Stock Trader Analogy: Deceptive Alignment
What Recommendations Can We Make for Deceptive Alignment?
Conclusion: What Does This Mean for AI Alignment?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.