Ed and Anna are co-first authors on this work. TL;DR
Introduction
Emergent Misalignment found that fine-tuning models on narrowly misaligned data, such as insecure code [...]
Outline:
(00:16) TL;DR
(01:19) Introduction
(03:25) Coherent Emergent Misalignment
(07:02) EM with 0.5B Parameters
(08:11) EM with a Full Supervised Finetune
(09:13) EM with a Single Rank 1 LoRA Adapter
(10:01) Future Work
(11:05) Contributions
(11:33) Acknowledgments
The original text contained 6 footnotes which were omitted from this narration.
First published: June 16th, 2025
Source: https://www.lesswrong.com/posts/yHmJrDSJpFaNTZ9Tr/model-organisms-for-emergent-misalignment)
---
Narrated by TYPE III AUDIO).
Images from the article:
)
)
)
)
)
)
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.