We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“Intrinsic power-seeking: AI Might Seek Power for Power’s Sake” by TurnTrout

2024/11/20

This is a link post.I think AI agents (trained end-to-end) might intrinsically prefer power-seeking, in addition to whatever instrumental drives they gain. The logical structure of the argument Premises

People will configure AI systems to be autonomous and reliable in order to accomplish tasks. This configuration process will reinforce & generalize behaviors which complete tasks reliably. Many tasks involve power-seeking. The AI will complete these tasks by seeking power. The AI will be repeatedly reinforced for its historical actions which seek power. There is a decent chance the reinforced circuits (“subshards”) prioritize gaining power for the AI's own sake, not just for the user's benefit.

Conclusion: There is a decent chance the AI seeks power for itself, when possible.

Read the full post at turntrout.com/intrinsic-power-seeking Find out when I post more content: newsletter & RSS

First published: November 19th, 2024

Source: https://www.lesswrong.com/posts/LWfYjZgXHN5GYtYpH/intrinsic-power-seeking-ai-might-seek-power-for-power-s-sake)

---

Narrated by TYPE III AUDIO).

Images from the article: undefined ) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.

“Intrinsic power-seeking: AI Might Seek Power for Power’s Sake” by TurnTrout

LessWrong (30+ Karma)

Shownotes Transcript

“Intrinsic power-seeking: AI Might Seek Power for Power’s Sake” by TurnTrout 01:41 Share

LessWrong (30+ Karma)

Shownotes Transcript

“Intrinsic power-seeking: AI Might Seek Power for Power’s Sake” by TurnTrout