Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you
Scott Alexander famously warned us to Beware Trivial Inconveniences.When you make a thing easy to do
There's this popular trope in fiction about a character being mind controlled without losing aw
This research was conducted at AE Studio and supported by the AI Safety Grants programme administere
We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives
The Most Forbidden Technique is training an AI using interpretability techniques.An AI produces a fi
You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them
Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's b
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reportin
Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming C
Note: an audio narration is not available for this article. Please see the original text. The origi
In a recent post, Cole Wyeth makes a bold claim:. . . there is one crucial test (yes this is a crux)
This isn't really a "timeline", as such – I don't know the timings – but this is
This is a critique of How to Make Superbabies on LessWrong.Disclaimer: I am not a geneticist[1], and
This is a link post.Your AI's training data might make it more “evil” and more able to circumve
I recently wrote about complete feedback, an idea which I think is quite important for AI safety. Ho
First, let me quote my previous ancient post on the topic:Effective Strategies for Changing Public O
In a previous book review I described exclusive nightclubs as the particle colliders of sociology—pl
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLM
It doesn’t look good.What used to be the AI Safety Summits were perhaps the most promising thing hap
Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibil