We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah
57:33
Share
2025/3/26
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What's the TL;DR of the SAE Research Update?
Why Did the Team Start This Research?
What Was the Main Task of the Research?
What Conclusions Did They Draw and How Did It Impact Their Strategy?
How Did They Compare Different Training Methods for Chat SAEs?
Can SAEs Be Used for Out-of-Distribution Probing?
What Was the Technical Setup for the Research?
What Were the Results of the Probing Experiment?
What Related Work and Discussions Were Mentioned?
Is It Surprising That SAEs Didn't Work as Expected?
How Can SAEs Be Used for Dataset Debugging?
What Are High Frequency Latents and How Do They Impact SAEs?
How Did They Modify the Sparsity Penalty in SAEs?
How Did They Evaluate the Interpretability of SAEs?
What Were the Final Results and Conclusions?
What Additional Insights Were Provided in the Appendix?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.