We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Interim Research Report: Mechanisms of Awareness” by Josh Engels, Neel Nanda, Senthooran Rajamanoharan
17:15
Share
2025/5/5
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
What Did the Researchers Discover?
Introduction to the Study
How Did They Reproduce LLM Risk Awareness on Gemma 3 12B?
Is It Just a Steering Vector?
Can the Steering Vector Be Trained Directly?
Are the Mechanisms for Awareness and Behavior the Same?
Risk Backdoors: What’s the Impact?
Investigating Further: What Else Did They Find?
Steering Vectors: Implementing Conditional Behavior
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.