Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or o
This is the abstract and introduction of our new paper, with some discussion of implications for AI
The CakeImagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and e
This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic
This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment
One hope for keeping existential risks low is to get AI companies to (successfully) make high-assura
Cross-posted from Telescopic TurnipAs we all know, humans are terrible at building butterflies. We c
This is a link post.A story I wrote about living through the transition to utopia.This is the one st
This is a link post.Present alongside President Trump: Sam AltmanLarry Ellison (Oracle executive ch
The AI Control Agenda, in its own words:… we argue that AI labs should ensure that powerful AIs are
I think a lot of people have heard so much about internalized prejudice and bias that they think the
(Both characters are fictional, loosely inspired by various traits from various real people. Be care
From AI scientist to AI research fleetResearch automation is here (1, 2, 3). We saw it coming and p
So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shor
Traditional economics thinking has two strong principles, each based on abundant historical data: Pr
All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tol
The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case stud
Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that
TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neu
Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on Fro