This is a link post.Adam Karvonen*, Can Rager*, Johnny Lin*, Curt Tigges*, Joseph Bloom*, David Chanin, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, Callum McDougall, Kola Ayonrinde, Matthew Wearden, Samuel Marks, Neel Nanda *equal contribution
** TL;DR**
We are releasing SAE Bench, a suite of 8 diverse sparse autoencoder (SAE) evaluations including unsupervised metrics and downstream tasks. Use our codebase to evaluate your own SAEs! You can compare 200+ SAEs of varying sparsity, dictionary size, architecture, and training time on Neuronpedia. Think we're missing an eval? We'd love for you to contribute it to our codebase! Email us.
🔍 Explore the Benchmark & Rankings 📊 Evaluate your SAEs with SAEBench ✉️ Contact Us
** Introduction** Sparse Autoencoders (SAEs) have become one of the most popular tools for AI interpretability. A lot of recent interpretability work has been focused on studying SAEs, in particular on improving SAEs, e.g. the Gated [...]
Outline:
(00:31) TL;DR
(01:18) Introduction
First published: December 11th, 2024
---
Narrated by TYPE III AUDIO).