Audio narrations of LessWrong posts.
This is a link post. If Anyone Builds It, Everyone Dies As we announced last month, Eliezer and Nat
This is a link post. Note: This is a research note, and the analysis is less rigorous than our stand
In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to
This is a link post. We’re currently in the process of locking in advertisements for the September l
Back in May I did a dramatization of a key and highly painful Senate hearing. Now, we are back for
Over the last two years or so, my girlfriend identified her cycle as having a unusually strong and
This is a rough research note where the primary objective was my own learning. I am sharing it beca
Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios,
People (including me) often say that scheming models “have to act as if they were aligned”.
The first in a series of bite-sized rationality prompts[1]. This is my most common opening-move f
Notes from a talk originally given at my alma mater I went to Grinnell College for my undergraduate
The insane attempted AI moratorium has been stripped from the BBB. That doesn’t mean they won’t try
When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and
For many people, including me, the real promise of AI is massively accelerated scientific discovery
This sequence draws from a position paper co-written with Simon Pepin Lehalleur, Jesse Hoogland, Ma
Not saying we should pause AI, but consider the following argument: Alignment without the capacity
TLDR: we find that SAEs trained on the difference in activations between a base model and its instr
This post presents some motivation on why we work on model diffing, some of our first results using
1) They're unlikely to be sentient (few neurons, immobile) 2) If they are sentient, the farming pra
This is a link post. Anthropic (post June 27th): We let Claude [Sonnet 3.7] manage an automated stor