We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

2025/3/4

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:

Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it. Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.

In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]

Outline:

(01:07) Techniques

(04:41) Reasons to prioritize detection over prevention

First published: March 4th, 2025

Source: https://www.lesswrong.com/posts/bAWPsgbmtLf8ptay6/for-scheming-we-should-first-focus-on-detection-and-then-on)

---

Narrated by TYPE III AUDIO).

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

LessWrong (30+ Karma)

What Techniques Can We Use to Detect Scheming in AI Systems?

Why Should We Prioritize Detection Over Prevention in AI Scheming?

Shownotes Transcript

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn 09:08 Share

LessWrong (30+ Karma)

What Techniques Can We Use to Detect Scheming in AI Systems?

Why Should We Prioritize Detection Over Prevention in AI Scheming?

Shownotes Transcript

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn