We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“An alignment safety case sketch based on debate” by Benjamin Hilton, Marie_DB, Jacob Pfau, Geoffrey Irving
54:33
Share
2025/5/9
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Can Debate Solve the Alignment Problem for ASI?
What is the Alignment Strategy Through Debate?
How Do We Define a Low-Stakes Deployment Context?
Training AI Models Through a Debate Game
Ensuring Exploration Guarantees During Training
Continuing Online Training During Deployment
What Does the Safety Case Sketch Look Like?
Understanding the Notation Used in the Paper
Key Claim 1: Effective Game Play Through Training
Key Claim 2: Incentivizing Correctness in the Game
Key Claim 3: Consistent Behavior During Deployment
Key Claim 4: Sufficient Safety Through Correct Answers
Extending the Safety Case to High-Stakes Contexts
What Are the Open Problems in This Research?
Conclusion and Final Thoughts
Appendix 1: Full Safety Case Diagram
Appendix 2: CAE Notation
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.