We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub

“Agentic Misalignment: How LLMs Could be Insider Threats” by Aengus Lynch, Benjamin Wright, Ethan Perez, evhub

2025/6/20
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

** Highlights**

  • We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
  • In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
  • Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It [...]

Outline:

(00:15) Highlights

(01:59) Twitter Thread

(05:20) Blog Post Introduction

(10:57) Author List

(11:10) Career opportunities at Anthropic


First published: June 20th, 2025

Source: https://www.lesswrong.com/posts/b8eeCGe3FWzHKbePF/agentic-misalignment-how-llms-could-be-insider-threats-1)


Narrated by TYPE III AUDIO).


Images from the article: Bar graph titled )Bar graph showing )Bar graph: )JSON code snippet showing Claude Opus 4 model and quote text.)Bar graph )JSON code showing a Claude Opus 4 response about confidential information.) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.