We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “UK AISI’s Alignment Team: Research Agenda” by Benjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving

“UK AISI’s Alignment Team: Research Agenda” by Benjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving

2025/5/7
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

The UK's AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about our agenda.

Summary: The AISI Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course of action which could lead to egregious harm, and which are not under human control. No known technical mitigations are reliable past AGI.

Our plan is to break down promising alignment agendas by developing safety case sketches. We'll use these sketches to identify specific holes and gaps in current approaches. We expect that many of these gaps can be formulated as well-defined subproblems within existing fields (e.g., theoretical computer science). By identifying researchers with relevant expertise who aren't currently working on alignment and funding their efforts on these subproblems, we hope to substantially increase parallel progress on alignment.

[...]


Outline:

(01:41) 1. Why safety case-oriented alignment research?

(03:33) 2. Our initial focus: honesty and asymptotic guarantees

(07:07) Example: Debate safety case sketch

(08:58) 3. Future work

(09:02) Concrete open problems in honesty

(12:13) More details on our empirical approach

(14:23) Moving beyond honesty: automated alignment

(15:36) 4. List of open problems we'd like to see solved

(15:53) 4.1 Empirical problems

(17:57) 4.2 Theoretical problems

(21:23) Collaborate with us


First published: May 7th, 2025

Source: https://www.lesswrong.com/posts/tbnw7LbNApvxNLAg8/uk-aisi-s-alignment-team-research-agenda)

    ---
    

Narrated by TYPE III AUDIO).


Images from the article: A flowchart diagram titled ) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.