This is a quick announcement/commitment post: I've been working at the PIBBSS Horizon Scanning team (with Lauren Greenspan and Lucas Teixeira), where we have been working on reviewing some "basic-science-flavored" alignment and interpretability research and doing talent scouting (see this intro doc we wrote so far, which we split off from an unfinished larger review). I have also been working on my own research. Aside from active projects, I've accumulated a bit of a backlog of technical writeups and shortforms in draft or "slack discussion"-level form, with various levels of publishability. This January, I'm planning to edit and publish some of these drafts as posts and shortforms on LW/the alignment forum. To keep myself accountable, I'm committing to publish at least 3 posts per week. I'm planning to post about (a subset? superset? overlapping set? of) the following themes:
Opinionated takes on a few research directions [...]
First published: January 2nd, 2025
Source: https://www.lesswrong.com/posts/vkdpw2vCnspK9t7nA/my-january-alignment-theory-nanowrimo)
---
Narrated by TYPE III AUDIO).