We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda
13:15
Share
2025/5/5
LessWrong (Curated & Popular)
AI Chapters
Transcribe
Chapters
Is Interpretability the Only Path to Detecting Deception in AI?
Why High Reliability in AI Safety Seems Unattainable?
What Are the Limitations of Interpretability in Detecting Deceptive AI?
Can Black-Box Methods Offer a Solution?
What Role Does Interpretability Play in AI Safety?
Conclusion
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.