We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Unfaithful Reasoning Can Fool Chain-of-Thought Monitoring” by Benjamin Arnav
07:48
Share
2025/6/2
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Why Does Task Subtlety Matter in AI Safety?
What Vulnerability Does Unfaithful or Misleading Reasoning Pose?
How Does a Hybrid Protocol Enhance Sabotage Detection?
What Are the Limitations and Future Directions of CoT Monitoring?
Conclusion: The Nuanced Efficacy of CoT Monitoring
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.