We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “OpenAI rewrote its Preparedness Framework” by Zach Stein-Perlman

“OpenAI rewrote its Preparedness Framework” by Zach Stein-Perlman

2025/4/16
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

New: https://openai.com/index/updating-our-preparedness-framework/

Old: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Summary

Thresholds & responses: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf#page=5. High and Critical thresholds trigger responses, like in the old PF; responses to Critical thresholds are not yet specified.

Three main categories of capabilities:

  • Bio/chem: High capabilities trigger security controls and (for external deployment) misuse safeguards
  • Cyber: High capabilities trigger security controls and (for external deployment) misuse safeguards and (for large-scale internal deployment) misalignment safeguards
  • AI Self-improvement: High capabilities trigger security controls

Misuse safeguards, misalignment safeguards, and security controls for High capability levels: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf#page=16. My quick takes:

  • Misuse safeguards: fine categories but it's not clear what level of assurance would suffice
  • Misalignment safeguards: worrying categories and it's not clear what level of assurance would suffice
  • Security controls: it's impossible to evaluate security level based on principles like these

[I'll edit this post to add more analysis soon]


First published: April 15th, 2025

Source: https://www.lesswrong.com/posts/Yy5ijtbNfwv8DWin4/openai-rewrote-its-preparedness-framework)

    ---
    

Narrated by TYPE III AUDIO).