Audio narrations of LessWrong posts.
Note: below is a hypothetical future written in strong terms and does not track my actual probabilit
My median expectation is that AGI[1] will be created 3 years from now. This has implications on how
Last week, a report was released on the potential risks of mirror biology - that is, biological syst
"What is the use of having developed a science well enough to make predictions if, in the end, all w
I quoted interesting-to-me parts and added some unjustified commentary. 20:00: Tom Brown says "the U
There's a common sentiment saying that a chatbot can’t really make you feel seen or validated. As ch
Executive SummaryThe Centre pour la Sécurité de l'IA[1] (CeSIA, pronounced "seez-ya") or French Cen
Elon Musk says that soon, builders “will be free to build” in America. If that promise is to be fulf
I'm editing this post.OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).
There are people I can talk to, where all of the following statements are obvious. They go without s
Between June and September 2024, we ran the third iteration of the PIBBSS Summer Research Fellowship
I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, an
TLDR: Recent papers have shown that Claude will sometimes act to achieve long-term goods rather than
(Work done at Redwood Research. Thanks to Misha Wagner, Fabien Roger, Tomek Korbak, Justis Mills, Sa
This is a link post.We recently released a paper presenting a demonstration of alignment faking wher
TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator
In light of other recent discussions, Scott Alexander recently attempted a unified theory of taste,
I have a lot of ideas about AGI/ASI safety. I've written them down in a paper and I'm sharing the pa
(Cross-posted from my website. Audio version here, or search for "Joe Carlsmith Audio" on your podca
What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Ant