We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Can SAE steering reveal sandbagging?” by jordine, Hoang Khiem, Felix Hofstätter
10:28
Share
2025/4/17
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What Did We Discover About SAE Steering and Sandbagging?
Why Did We Explore SAE Features for Uncovering Sandbagging?
How Did We Conduct the Experiment?
What Were the Key Findings?
What Are the Limitations of Our Approach?
What Do These Results Imply for Future Research?
Who Contributed to This Study?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.