We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“OpenAI rewrote its Preparedness Framework” by Zach Stein-Perlman

2025/4/16

LessWrong (30+ Karma)

Summary

Thresholds & responses: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf#page=5. High and Critical thresholds trigger responses, like in the old PF; responses to Critical thresholds are not yet specified.

Three main categories of capabilities:

Bio/chem: High capabilities trigger security controls and (for external deployment) misuse safeguards
Cyber: High capabilities trigger security controls and (for external deployment) misuse safeguards and (for large-scale internal deployment) misalignment safeguards
AI Self-improvement: High capabilities trigger security controls

Misuse safeguards, misalignment safeguards, and security controls for High capability levels: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf#page=16. My quick takes:

Misuse safeguards: fine categories but it's not clear what level of assurance would suffice
Misalignment safeguards: worrying categories and it's not clear what level of assurance would suffice
Security controls: it's impossible to evaluate security level based on principles like these

[I'll edit this post to add more analysis soon]

First published: April 15th, 2025

Source: https://www.lesswrong.com/posts/Yy5ijtbNfwv8DWin4/openai-rewrote-its-preparedness-framework)

---