We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions

LessWrong (30+ Karma)

Audio narrations of LessWrong posts.

Episodes

Total: 1064

This is a link post. If Anyone Builds It, Everyone Dies As we announced last month, Eliezer and Nat

This is a link post. Note: This is a research note, and the analysis is less rigorous than our stand

In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to

This is a link post. We’re currently in the process of locking in advertisements for the September l

Back in May I did a dramatization of a key and highly painful Senate hearing. Now, we are back for

Over the last two years or so, my girlfriend identified her cycle as having a unusually strong and

This is a rough research note where the primary objective was my own learning. I am sharing it beca

Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios,

  People (including me) often say that scheming models “have to act as if they were aligned”.

The first in a series of bite-sized rationality prompts[1]. This is my most common opening-move f

Notes from a talk originally given at my alma mater I went to Grinnell College for my undergraduate

The insane attempted AI moratorium has been stripped from the BBB. That doesn’t mean they won’t try

When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and

For many people, including me, the real promise of AI is massively accelerated scientific discovery

This sequence draws from a position paper co-written with Simon Pepin Lehalleur, Jesse Hoogland, Ma

Not saying we should pause AI, but consider the following argument: Alignment without the capacity

TLDR: we find that SAEs trained on the difference in activations between a base model and its instr

This post presents some motivation on why we work on model diffing, some of our first results using

1) They're unlikely to be sentient (few neurons, immobile) 2) If they are sentient, the farming pra

This is a link post. Anthropic (post June 27th): We let Claude [Sonnet 3.7] manage an automated stor