LessWrong (30+ Karma)

[Linkpost] “Modifying LLM Beliefs with Synthetic Document Finetuning” by RowanWang, Johannes Treutlein, Ethan Perez, Fabien Roger, Sam Marks

2025/4/24

This is a link post. In this post, we study whether we can modify an LLM's beliefs and investigate w

“‘The Era of Experience’ has an unsolved technical alignment problem” by Steven Byrnes

2025/4/24

10 chapters

Every now and then, some AI luminaries (1) propose that the future of powerful AI will be reinforc

[Linkpost] “My Favorite Productivity Blog Posts” by Parker Conley

2025/4/24

This is a link post. I’ve read at least a few hundred blog posts, maybe upwards of a thousand. Agree

“OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure” by garrison

2025/4/24

8 chapters

Converting to a for-profit model would undermine the company's founding mission to ensure AGI "bene

“o3 Is a Lying Liar” by Zvi

2025/4/23

5 chapters

I love o3. I’m using it for most of my queries now. But that damn model is a lying liar. Who lies.

“Putting up Bumpers” by Sam Bowman

2025/4/23

tl;dr: Even if we can't solve alignment, we can solve the problem of catching and fixing misalignm

[Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan

2025/4/23

This is a link post. to follow up my philantropic pledge from 2020, i've updated my philanthropy pag

[Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery

2025/4/23

This is a link post. Guillaume Blanc has a piece in Works in Progress (I assume based on his paper)

“The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety” by Katalina Hernandez

2025/4/23

16 chapters

The European AI Office is currently writing the rules for how general-purpose AI (GPAI) models will

“Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt” by Joel Z. Leibo, Wilcunningham, Seb Krier, Manfred Diaz

2025/4/22

7 chapters

Joel Z. Leibo [1], Alexander Sasha Vezhnevets [1], William A. Cunningham [1, 2], Sébastien Krier [1

“You Better Mechanize” by Zvi

2025/4/22

10 chapters

Or you had better not. The question is which one. This post covers the announcement of Mechanize, t

“The US Executive vs Supreme Court Deportations Clash” by NunoSempere

2025/4/22

8 chapters

Forecaster perspectives Sentinel forecasters in aggregate assess as “83% true” (65% to 100%) the st

“Accountability Sinks” by Martin Sustrik

2025/4/22

Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an

“The Uses of Complacency” by sarahconstantin

2025/4/22

9 chapters

Midjourney Our Culture Expects Self-Justification I really like David Chapman's explication of what

“$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?” by johnswentworth, David Lorell

2025/4/21

7 chapters

Audio note: this article contains 36 uses of latex notation, so the narration may be difficult to

“Crime and Punishment #1” by Zvi

2025/4/21

25 chapters

This seemed like a good next topic to spin off from monthlies and make into its own occasional seri

“AI 2027 is a Bet Against Amdahl’s Law” by snewman

2025/4/21

5 chapters

AI 2027 lies at a Pareto frontier – it contains the best researched argument for short timelines, o

“Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red” by Julian Bradshaw

2025/4/21

10 chapters

Disclaimer: this post was not written by me, but by a friend who wishes to remain anonymous. I did

“How Close We Are to a Complete List of Imprinted Genes” by Morpheus

2025/4/20

9 chapters

This post summarizes some of the research I have been doing for Bootstrap Bio AKA kman and Genesmit

“Impact, agency, and taste” by benkuhn

2025/4/20

6 chapters

I’ve been thinking recently about what sets apart the people who’ve done the best work at Anthropic

Episodes

[Linkpost] “Modifying LLM Beliefs with Synthetic Document Finetuning” by RowanWang, Johannes Treutlein, Ethan Perez, Fabien Roger, Sam Marks

“‘The Era of Experience’ has an unsolved technical alignment problem” by Steven Byrnes

[Linkpost] “My Favorite Productivity Blog Posts” by Parker Conley

“OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure” by garrison

“o3 Is a Lying Liar” by Zvi

“Putting up Bumpers” by Sam Bowman

[Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan

[Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery

“The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety” by Katalina Hernandez

“Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt” by Joel Z. Leibo, Wilcunningham, Seb Krier, Manfred Diaz

“You Better Mechanize” by Zvi

“The US Executive vs Supreme Court Deportations Clash” by NunoSempere

“Accountability Sinks” by Martin Sustrik

“The Uses of Complacency” by sarahconstantin

“$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?” by johnswentworth, David Lorell

“Crime and Punishment #1” by Zvi

“AI 2027 is a Bet Against Amdahl’s Law” by snewman

“Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red” by Julian Bradshaw

“How Close We Are to a Complete List of Imprinted Genes” by Morpheus

“Impact, agency, and taste” by benkuhn