We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“Proposal for making credible commitments to AIs.” by Cleo Nardo

2025/6/30

Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil. There has been growing interest in the deal-making agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1] I think the deal-making agenda breaks down into two main subproblems:

How can we make credible commitments to AIs?
Would credible commitments motivate an AI to be safe and useful?

There are other issues, but when I've discussed deal-making with people, (1) and (2) are the most common issues raised. See footnote for some other issues in dealmaking.[2] Here is my current best assessment of how we can make credible commitments to AIs. [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 27th, 2025 Source: https://www.lesswrong.com/posts/vxfEtbCwmZKu9hiNr/proposal-for-making-credible-commitments-to-ais) --- Narrated by TYPE III AUDIO). ---Images from the article:)Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.

“Proposal for making credible commitments to AIs.” by Cleo Nardo

LessWrong (Curated & Popular)

Shownotes Transcript

“Proposal for making credible commitments to AIs.” by Cleo Nardo 05:19 Share

LessWrong (Curated & Popular)

Shownotes Transcript

“Proposal for making credible commitments to AIs.” by Cleo Nardo