Audio narrations of LessWrong posts.
One key hope for mitigating risk from misalignment is inspecting the AI's behavior, noticing that i
There's an implicit model I think many people have in their heads of how everyone else behaves. As
This is a link post. Definition In 1954, Roger Bannister ran the first officially sanctioned sub-4-m
This is a link post. Executive summary The Trump administration backtracked from his tariff plan, re
Epistemic status: These are results of a brief research sprint and I didn't have time to investigat
Dario Amodei, CEO of Anthropic, recently worried about a world where only 30% of jobs become automa
Epistemic status: a model I find helpful to make sense of disagreements and, sometimes, resolve the
TL;DR: If we optimize a steering vector to induce a language model to output a single piece of harm
TL;DR: I claim that many reasoning patterns that appear in chains-of-thought are not actually used
Thanks to Linda Linsefors for encouraging me to write my story. Although it might not generalize to
I'm graduating from UChicago in around 60 days, and I've been thinking about what I've learned thes
Introduction This is a nuanced “I was wrong” post. Something I really like about AI safety and EA/r
Epistemic status: Noticing confusion There is little discussion happening on LessWrong with regards
In this post I lay out a concrete vision of how reward-seekers and schemers might function. I descr
Cross-posted from Substack. AI job displacement will affect young people first, disrupting the usua
Paper is good. Somehow, a blank page and a pen makes the universe open up before you. Why paper has
Summary OpenAI recently released the Responses API. Most models are available through both the new
It's generally agreed that as AIs get more capable, risks from misalignment increase. But there are
Google Lays Out Its Safety Plans I want to start off by reiterating kudos to Google for actually
Authors: Eli Lifland, Nikola Jurkovic[1], FutureSearch[2]This is supporting research for AI 2027. We