We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“My January alignment theory Nanowrimo” by Dmitry Vaintrob

2025/1/3

This is a quick announcement/commitment post: I've been working at the PIBBSS Horizon Scanning team (with Lauren Greenspan and Lucas Teixeira), where we have been working on reviewing some "basic-science-flavored" alignment and interpretability research and doing talent scouting (see this intro doc we wrote so far, which we split off from an unfinished larger review). I have also been working on my own research. Aside from active projects, I've accumulated a bit of a backlog of technical writeups and shortforms in draft or "slack discussion"-level form, with various levels of publishability. This January, I'm planning to edit and publish some of these drafts as posts and shortforms on LW/the alignment forum. To keep myself accountable, I'm committing to publish at least 3 posts per week. I'm planning to post about (a subset? superset? overlapping set? of) the following themes:

Opinionated takes on a few research directions [...]

First published: January 2nd, 2025

Source: https://www.lesswrong.com/posts/vkdpw2vCnspK9t7nA/my-january-alignment-theory-nanowrimo)

---

Narrated by TYPE III AUDIO).

“My January alignment theory Nanowrimo” by Dmitry Vaintrob

LessWrong (30+ Karma)

Shownotes Transcript

“My January alignment theory Nanowrimo” by Dmitry Vaintrob 03:44 Share

LessWrong (30+ Karma)

Shownotes Transcript

“My January alignment theory Nanowrimo” by Dmitry Vaintrob