I think Instruction-following AGI is easier and more likely than value aligned AGI, and that this accounts for one major crux of disagreement on alignment difficulty. I got several responses to that piece that didn't dispute that intent alignment is easier, but argued we shouldn't give up on value alignment. I think that's right. Here's another way to frame the value of personal intent alignment: we can use a superintelligent instruction-following AGI to solve full value alignment. This is different than automated alignment research; it's not hoping tool AI can help with our homework, it's making an AGI smarter than us in every way do our homework for us. It's a longer term plan. Having a superintelligent, largely autonomous entity that just really likes taking instructions from puny humans is counterintuitive, but it seems both logically consistent. And it seems technically achievable on the current trajectory - if we [...]
First published: November 5th, 2024
---
Narrated by TYPE III AUDIO).