LessWrong (30+ Karma)

“Planning for Extreme AI Risks” by joshc

2025/1/29

8 chapters

This post should not be taken as a polished recommendation to AI companies and instead should be tre

“Dario Amodei: On DeepSeek and Export Controls” by Zach Stein-Perlman

2025/1/29

This is a link post.Dario corrects misconceptions and endorses export controls.Also:DeepSeek does no

“Operator” by Zvi

2025/1/29

9 chapters

No one is talking about OpenAI's Operator. We’re, shall we say, a bit distracted. It's still a rathe

“Open Problems in Mechanistic Interpretability” by Lee Sharkey, bilalchughtai

2025/1/29

This is a link post.TL;DR: This paper brings together ~30 mech interp researchers from 18 different

“Fake thinking and real thinking” by Joe Carlsmith

2025/1/29

7 chapters

(Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.)“There comes a moment

“DeepSeek Panic at the App Store” by Zvi

2025/1/29

DeepSeek released v3. Market didn’t react. DeepSeek released r1. Market didn’t react. DeepSeek relea

“The Game Board has been Flipped: Now is a good time to rethink what you’re doing” by Alex Lintz

2025/1/29

Cross-posted on the EA Forum here IntroductionSeveral developments over the past few months should c

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

2025/1/28

Summary and Table of ContentsThe goal of this post is to discuss the so-called “sharp left turn”, t

“Ten people on the inside” by Buck

2025/1/28

(Many of these ideas developed in conversation with Ryan Greenblatt)In a shortform, I described some

“Agents have to be aligned to help us achieve alignment. They don’t have to be aligned to help us achieve an indefinite pause.” by Hastings

2025/1/28

One restatement of "Alignment is very hard" is "Agent X, with IQ 200, expects to achieve zero utilit

“Should you go with your best guess?: Against precise Bayesianism and related views” by Anthony DiGiovanni

2025/1/27

10 chapters

Audio note: this article contains 88 uses of latex notation, so the narration may be difficult to

“My supervillain origin story” by Dmitry Vaintrob

2025/1/27

When I started graduate school (for math), I was very interested in big ideas. I had had a couple ex

“Kessler’s Second Syndrome” by Jesse Hoogland

2025/1/27

It started as so many dooms do, with a flash in the night sky over the South China Sea. Testing a ne

“Why care about AI personhood?” by Francis Rhys Ward

2025/1/27

In this new paper, I discuss what it would mean for AI systems to be persons — entities with propert

“Brainrot” by Jesse Hoogland

2025/1/26

January: In early 2026, Meta launches a fleet of new AI influencers, targeting the massive audience

“Counterintuitive effects of minimum prices” by dynomight

2025/1/26

4 chapters

The Attorney General of Massachusetts recently announced that drivers for ride-sharing companies mus

“The Rising Sea” by Jesse Hoogland

2025/1/26

And then we hit a wall. Nobody expected it. Well... almost nobody. Yann LeCun posted his "I told you

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

2025/1/26

8 chapters

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or o

“On polytopes” by Dmitry Vaintrob

2025/1/25

7 chapters

[Epistemic status: slightly ranty. This is a lightly edited slack chat, and so may be lower-quality.

“Attribution-based parameter decomposition” by Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel, Lee Sharkey

2025/1/25

3 chapters

This is a linkpost for Apollo Research's new interpretability paper: "Interpretability in Parameter

Episodes

“Planning for Extreme AI Risks” by joshc

“Dario Amodei: On DeepSeek and Export Controls” by Zach Stein-Perlman

“Operator” by Zvi

“Open Problems in Mechanistic Interpretability” by Lee Sharkey, bilalchughtai

“Fake thinking and real thinking” by Joe Carlsmith

“DeepSeek Panic at the App Store” by Zvi

“The Game Board has been Flipped: Now is a good time to rethink what you’re doing” by Alex Lintz

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

“Ten people on the inside” by Buck

“Agents have to be aligned to help us achieve alignment. They don’t have to be aligned to help us achieve an indefinite pause.” by Hastings

“Should you go with your best guess?: Against precise Bayesianism and related views” by Anthony DiGiovanni

“My supervillain origin story” by Dmitry Vaintrob

“Kessler’s Second Syndrome” by Jesse Hoogland

“Why care about AI personhood?” by Francis Rhys Ward

“Brainrot” by Jesse Hoogland

“Counterintuitive effects of minimum prices” by dynomight

“The Rising Sea” by Jesse Hoogland

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

“On polytopes” by Dmitry Vaintrob

“Attribution-based parameter decomposition” by Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel, Lee Sharkey