We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

2025/3/4
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:

Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it.  Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.

In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]


Outline:

(01:07) Techniques

(04:41) Reasons to prioritize detection over prevention


First published: March 4th, 2025

Source: https://www.lesswrong.com/posts/bAWPsgbmtLf8ptay6/for-scheming-we-should-first-focus-on-detection-and-then-on)

    ---
    

Narrated by TYPE III AUDIO).