We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda
13:16
Share
2025/5/4
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Can Interpretability Save Us from Deceptive AI?
Why High Reliability in AI Safety Seems Unattainable?
What Limits the Reliability of Interpretability?
The Potential of Black-Box Methods in AI Safety
The Role of Interpretability in a Defense-in-Depth Strategy
Conclusion: What Does the Future Hold for AI Safety?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.