Audio narrations of LessWrong posts.
In this episode of our podcast, Timothy Telleen-Lawton and I talk to Oliver Habryka of Lightcone In
John: So there's this thing about interp, where most of it seems to not be handling one of the stan
Introduction Focusmate changed my life. I started using it mid-2023 and have been a power user sinc
GPT-4o tells you what it thinks you want to hear. The results of this were rather ugly. You get ex
tl;dr This post is an update on the Proceedings of ILIAD, a conference journal for AI alignment res
In this post, we list 7 of our favorite easy-to-start directions in AI control. (Really, projects t
This is post 2 of a sequence on my framework for doing and thinking about research. Start here. Bef
Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum
This is a link post. I've gotten a lot of value out of the details of how other people use LLMs, so
This is a link post. So this post is an argument that multi-decade timelines are reasonable, and the
This is a link post. Dario Amodei posted a new essay titled "The Urgency of Interpretability" a coup
For a lay audience, but I've seen a surprising number of knowledgeable people fretting over depress
This is the first post in a sequence about how I think about and break down my research process. Po
As I think about "what to do about AI x-risk?", some principles that seem useful to me: Short time
This is a personal post and does not necessarily reflect the opinion of other members of Apollo Res
A common claim is that concern about [X] ‘distracts’ from concern about [Y]. This is often used as
Enjoy it while it lasts. The Claude 4 era, or the o4 era, or both, are coming soon. Also, welcome t
What in retrospect seem like serious moral crimes were often widely accepted while they were happen
Something's changed about reward hacking in recent systems. In the past, reward hacks were usually a
We've published an essay series on what we call the intelligence curse. Most content is brand new,