We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“Intent alignment as a stepping-stone to value alignment” by Seth Herd

2024/11/6

I think Instruction-following AGI is easier and more likely than value aligned AGI, and that this accounts for one major crux of disagreement on alignment difficulty. I got several responses to that piece that didn't dispute that intent alignment is easier, but argued we shouldn't give up on value alignment. I think that's right. Here's another way to frame the value of personal intent alignment: we can use a superintelligent instruction-following AGI to solve full value alignment. This is different than automated alignment research; it's not hoping tool AI can help with our homework, it's making an AGI smarter than us in every way do our homework for us. It's a longer term plan. Having a superintelligent, largely autonomous entity that just really likes taking instructions from puny humans is counterintuitive, but it seems both logically consistent. And it seems technically achievable on the current trajectory - if we [...]

First published: November 5th, 2024

Source: https://www.lesswrong.com/posts/587AsXewhzcFBDesH/intent-alignment-as-a-stepping-stone-to-value-alignment)

---

Narrated by TYPE III AUDIO).

“Intent alignment as a stepping-stone to value alignment” by Seth Herd

LessWrong (30+ Karma)

Shownotes Transcript

“Intent alignment as a stepping-stone to value alignment” by Seth Herd 05:45 Share

LessWrong (30+ Karma)

Shownotes Transcript

“Intent alignment as a stepping-stone to value alignment” by Seth Herd