This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:
Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it. Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.
In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]
Outline:
(01:07) Techniques
(04:41) Reasons to prioritize detection over prevention
First published: March 4th, 2025
---
Narrated by TYPE III AUDIO).