We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “Mech interp is not pre-paradigmatic” by Lee Sharkey

“Mech interp is not pre-paradigmatic” by Lee Sharkey

2025/6/10
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

This is a blogpost version of a talk I gave earlier this year at GDM. Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.

It's often said that mech interp is pre-paradigmatic.

I think it's worth being skeptical of this claim.

In this post I argue that:

  • Mech interp is not pre-paradigmatic.
  • Within that paradigm, there have been "waves" (mini paradigms). Two waves so far.
  • Second-Wave Mech Interp has recently entered a 'crisis' phase.
  • We may be on the edge of a third wave.

** Preamble: Kuhn, paradigms, and paradigm shifts**

First, we need to be familiar with the basic definition of a paradigm: A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]


Outline:

(00:58) Preamble: Kuhn, paradigms, and paradigm shifts

(03:56) Claim: Mech Interp is Not Pre-paradigmatic

(07:56) First-Wave Mech Interp (ca. 2012 - 2021)

(10:21) The Crisis in First-Wave Mech Interp

(11:21) Second-Wave Mech Interp (ca. 2022 - ??)

(14:23) Anomalies in Second-Wave Mech Interp

(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)

(18:25) Toward Third-Wave Mechanistic Interpretability

(20:28) The Basics of Parameter Decomposition

(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp

(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp

(27:27) Conclusion

The original text contained 6 footnotes which were omitted from this narration.


First published: June 10th, 2025

Source: https://www.lesswrong.com/posts/beREnXhBnzxbJtr8k/mech-interp-is-not-pre-paradigmatic)

    ---
    

Narrated by TYPE III AUDIO).


Images from the article: Presentation slide titled )Technical diagram titled )Table comparing )Academic slide titled )A presentation slide titled )![Academic slide discussing pre-paradigm phase of mechanistic interpretation in neuroscience, with diagrams.

The slide includes three scientific figures: a hierarchical neural connectivity diagram from Hubel and Wiesel (1958), a complex network visualization from Rousselet (2004), and neural network weights visualization from Rumelhart (1986). The main text outlines concepts, methods, and standards from computational neuroscience and connectionism.](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfpClEDMDIICv0fc7c1v7Kg84TY2Lihifrwn9AMr-HFVTjA2NkLMRG2OmzA7FePEF90WhoIpXb4KOd0pL2TV0tk8CRPwVgGgA3Qab_rGB09gxrG6phNALHtUnoPzYVCxLXItYgy3w?key=RjtCWWHJ6jQ9e-KD2lcWVw))![Concept map showing ](https://lh7-rt.googleusercontent.com/docsz/AD_4nXdsw7VURssh6ptCEmLTo6Mmc8Wu2R2-azIIyasSHplj1vHFBsArVimPg7VsPq2jiBkfECBGtqJ8ImKMTCC4Y-s6enSdeyj8nuA5_osFchghOh0KPWbbGe5s6kb6rRSiyiVmDSajUw?key=RjtCWWHJ6jQ9e-KD2lcWVw))![Diagram showing ](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfksCeoqVPFn7mKDpjw5M7QX5ogYwfACjqlnrpU9z9QrdxKUsZtoAGsHbHmqqS0CxEUdRN8s1yYXodfyr8yArHr3-1t9MGoOxvbgw1HunG9OIME_x-0ZbkAMAUr7T2tgaYrUFJBuw?key=RjtCWWHJ6jQ9e-KD2lcWVw))![Slide showing parameter decomposition concept with neural network diagrams and matrix visualization.

The image illustrates parameter decomposition in neural networks, showing a process of flattening network weights into a parameter vector, then decomposing it into simpler components. The diagram includes matrix representations and simplified network structures to demonstrate how the decomposition works.

The left side lists key concepts including ](https://lh7-rt.googleusercontent.com/docsz/AD_4nXfViW88c4Ogg8Tuvd9iO3O3hX-F30K0pcWOpyhPz5j-uzAYU5OpYvHobHdtahb-tmsP9QXmzZ7iURjs8Mi13MZvg-UPITCNVKwKRtsUwwMKAoB2Gwlss9IW_ZDvmgjX5VwUcmYQjw?key=RjtCWWHJ6jQ9e-KD2lcWVw)) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.