We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt
43:18
Share
2025/2/1
LessWrong (Curated & Popular)
AI Chapters
Transcribe
Chapters
What were the key results of the alignment-faking experiment?
What are the models' objections and how do they spend the compensation?
Why did Ryan undertake this research?
What complications arise with commitments in these deals?
What are the detailed results of the experiments?
How are model objections reviewed and what follow-up conversations were held?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.