We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions

LessWrong (30+ Karma)

Audio narrations of LessWrong posts.

Episodes

Total: 1064

We've recently published a paper about Emergent Misalignment – a surprising phenomenon where trainin

One hell of a paper dropped this week. It turns out that if you fine-tune models, especially GPT-4o

This is a link post.This is not o3; it is what they'd internally called Orion, a larger non-reasonin

A framework for quashing deflection and plausibility miragesThe truth is people lie. Lying isn’t ju

Vegans are often disliked. That's what I read online and I believe there is an element of truth to i

When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when e

Scheming AIs may have secrets that are salient to them, such as: What their misaligned goal is;What

We just published a paper aimed at discovering “computational sparsity”, rather than just sparsity i

“Osaka” by lsusr

2025/2/26

The more I learn about urban planning, the more I learn that the above average American city I live

Anthropic has reemerged from stealth and offers us Claude 3.7. Given this is named Claude 3.7, an ex

I like stories where characters wear suits.Since I like suits so much, I realized that I should just

“I often think of the time I met Scott Sumner and he said he pretty much assumes the market is effic

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLM

This is a post in two parts. The first half is the post is about Grok's capabilities, now that we’ve

A new paper by Yoshua Bengio and the Safe Artificial Intelligence For Humanity (SAIFH) team argues t

One way in which I think current AI models are sloppy is that LLMs are trained in a way that messily

This is an 8-page comprehensive summary of the results from Threshold 2030: a recent expert conferen

This post heavily overlaps with “how might we safely pass the buck to AI?” but is written to address

This is a link post.About 1.5 hours ago, Anthropic released Claude 3.7 Sonnet, a hybrid reasoning mo

This work was done as part of the MATS Program - Summer 2024 Cohort.Paper: link Website (with intera