Audio narrations of LessWrong posts.
We've recently published a paper about Emergent Misalignment – a surprising phenomenon where trainin
One hell of a paper dropped this week. It turns out that if you fine-tune models, especially GPT-4o
This is a link post.This is not o3; it is what they'd internally called Orion, a larger non-reasonin
A framework for quashing deflection and plausibility miragesThe truth is people lie. Lying isn’t ju
Vegans are often disliked. That's what I read online and I believe there is an element of truth to i
When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when e
Scheming AIs may have secrets that are salient to them, such as: What their misaligned goal is;What
We just published a paper aimed at discovering “computational sparsity”, rather than just sparsity i
The more I learn about urban planning, the more I learn that the above average American city I live
Anthropic has reemerged from stealth and offers us Claude 3.7. Given this is named Claude 3.7, an ex
I like stories where characters wear suits.Since I like suits so much, I realized that I should just
“I often think of the time I met Scott Sumner and he said he pretty much assumes the market is effic
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLM
This is a post in two parts. The first half is the post is about Grok's capabilities, now that we’ve
A new paper by Yoshua Bengio and the Safe Artificial Intelligence For Humanity (SAIFH) team argues t
One way in which I think current AI models are sloppy is that LLMs are trained in a way that messily
This is an 8-page comprehensive summary of the results from Threshold 2030: a recent expert conferen
This post heavily overlaps with “how might we safely pass the buck to AI?” but is written to address
This is a link post.About 1.5 hours ago, Anthropic released Claude 3.7 Sonnet, a hybrid reasoning mo
This work was done as part of the MATS Program - Summer 2024 Cohort.Paper: link Website (with intera