Audio note: this article contains 55 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
** Introduction**
This is the first installment of my January writing project. We will look at generative neural networks from the framework of (probabilistic) "formal grammars", specifically focusing on building a complex grammar out of simple “rule grammars”. This turns out to lead to a nice, and relatively non-technical way of discussing how complex systems like language models can be built out of "heuristics". Thinking more about how these blocks are combined (and focussing on the difference between combining rules via "AND" vs. "OR") leads to some new insights on generalization, which are sometimes lost in the fuzzy language of heuristics and circuits. In a follow-up to this post we'll apply the tools introduced here in a more [...]
Outline:
(00:18) Introduction
(01:29) Grammars, subgrammars, and circuits
(03:33) Subgrammars.
(05:03) Probabilistic grammars and a tiny bit of linguistics
(10:12) Heuristics
(13:19) Rules, subgrammars and logits
(18:28) Subgrammars in mechinterp
(18:44) Modular addition: definition
(20:42) Modular addition: circuits as subgrammars
(24:24) Upshots so far
(26:04) Memorization and beyond
(31:29) Future directions
The original text contained 2 footnotes which were omitted from this narration.
First published: January 2nd, 2025
Source: https://www.lesswrong.com/posts/sjoW35fgBJ82ADuND/grammars-rule-combinatorics-and-generalization-1)
---
Narrated by TYPE III AUDIO).