We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“SHIFT relies on token-level features to de-bias Bias in Bios probes” by Tim Hua
13:16
Share
2025/3/22
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What is SHIFT and Its Background?
How Does the SHIFT Experiment Rely on Embedding Features?
Can You Train an Unbiased Classifier by Deleting Gender-Related Tokens?
De-biasing by Removing Gender-Related Tokens from Embeddings?
Is the Bias in Bios Task Sufficient to Validate SHIFT?
Applying SHIFT to Classifiers and Reward Models
Using SHIFT for Cognition-Based Oversight and Disambiguating Classifiers
What Are the Next Steps for SHIFT?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.