We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “An overview of areas of control work” by ryan_greenblatt

“An overview of areas of control work” by ryan_greenblatt

2025/3/26
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

In this post, I'll list all the areas of control research (and implementation) that seem promising to me.

This references framings and abstractions discussed in Prioritizing threats for AI control.

First, here is a division of different areas, though note that these areas have some overlaps:

  • Developing and using settings for control evaluations (including technique iteration)

  • Better understanding control-relevant capabilities and model properties

  • Developing specific countermeasures/techniques in isolation (rather than via iterating in a setting)

  • Doing control experiments on actual current AI usage

  • Prototyping or implementing software infrastructure for control and security measures that are particularly relevant to control

  • Prototyping or implementing human processes for control

  • Conceptual research and other ways of researching control other than empirical experiments

  • Nearer-term applications of control to practice and accelerate later usage of control

This division is imperfect and somewhat [...]


Outline:

(01:43) Developing and using settings for control evaluations

(13:51) Better understanding control-relevant capabilities and model properties

(28:21) Developing specific countermeasures/techniques in isolation

(36:01) Doing control experiments on actual AI usage

(40:43) Prototyping or implementing software infrastructure for control and security measures that are particularly relevant to control

(41:58) Prototyping or implementing human processes for control

(46:50) Conceptual research and other ways of researching control other than empirical experiments

(49:37) Nearer-term applications of control to practice and accelerate later usage of control


First published: March 25th, 2025

Source: https://www.lesswrong.com/posts/Eeo9NrXeotWuHCgQW/an-overview-of-areas-of-control-work)

    ---
    

Narrated by TYPE III AUDIO).