We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “SAEBench: A Comprehensive Benchmark for Sparse Autoencoders” by Can, Adam Karvonen, Johnny Lin, Curt Tigges, Joseph Bloom, chanind, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, CallumMcDougall, Kola Ayonrinde, Matthew Wearden, Sam Marks, Neel Nanda

“SAEBench: A Comprehensive Benchmark for Sparse Autoencoders” by Can, Adam Karvonen, Johnny Lin, Curt Tigges, Joseph Bloom, chanind, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, CallumMcDougall, Kola Ayonrinde, Matthew Wearden, Sam Marks, Neel Nanda

2024/12/11
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

This is a link post.Adam Karvonen*, Can Rager*, Johnny Lin*, Curt Tigges*, Joseph Bloom*, David Chanin, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, Callum McDougall, Kola Ayonrinde, Matthew Wearden, Samuel Marks, Neel Nanda *equal contribution

** TL;DR**

We are releasing SAE Bench, a suite of 8 diverse sparse autoencoder (SAE) evaluations including unsupervised metrics and downstream tasks. Use our codebase to evaluate your own SAEs! You can compare 200+ SAEs of varying sparsity, dictionary size, architecture, and training time on Neuronpedia. Think we're missing an eval? We'd love for you to contribute it to our codebase! Email us.

🔍 Explore the Benchmark & Rankings 📊 Evaluate your SAEs with SAEBench ✉️ Contact Us

** Introduction** Sparse Autoencoders (SAEs) have become one of the most popular tools for AI interpretability. A lot of recent interpretability work has been focused on studying SAEs, in particular on improving SAEs, e.g. the Gated [...]


Outline:

(00:31) TL;DR

(01:18) Introduction


First published: December 11th, 2024

Source: https://www.lesswrong.com/posts/jGG24BzLdYvi9dugm/saebench-a-comprehensive-benchmark-for-sparse-autoencoders)

    ---
    

Narrated by TYPE III AUDIO).