We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Do models say what they learn?” by Andy Arditi, marvinli, Joe Benton, Miles Turpin
27:25
Share
2025/3/22
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Do models say what they learn?
Why is CoT monitoring important for AI oversight?
What methodology did the researchers use?
Case study: loan recommendations based on nationality
Is the model's behavior surprising?
What does the discussion reveal?
Appendix and author contributions
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.