We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy

2025/1/19

Tl;dr, Neural networks are deterministic and sometimes even reversible, which causes Shannon information measures to degenerate. But information theory seems useful. How can we square this (if it's possible at all)? The attempts so far in the literature are unsatisfying. Here is a conceptual question: what is the Right Way to think about information theoretic quantities in neural network contexts? Example: I've been recently thinking about information bottleneck methods: given some data distribution P(X, Y), it tries to find features Z specified by P(Z|X) that have nice properties like minimality (small I(X;Z)) and sufficiency (big I(Z;Y)). But as pointed out in the literature several times, the fact that neural networks implement a deterministic map makes these information theoretic quantities degenerate:

if Z is a deterministic map of X and they’re both continuous, then I(X;Z) is infinite. Binning / Quantizing them does turn Z into a stochastic function [...]

Outline:

(02:28) Treat the weight as stochastic:

(04:23) Use something other than shannon information measures:

First published: January 19th, 2025

Source: https://www.lesswrong.com/posts/E5EazNvQHiAKDxW3W/what-s-the-right-way-to-think-about-information-theoretic)

---

Narrated by TYPE III AUDIO).

“What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy

LessWrong (30+ Karma)

What Happens When We Treat the Weights as Stochastic?

Are There Alternatives to Shannon Information Measures?

Shownotes Transcript

“What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy 05:51 Share

LessWrong (30+ Karma)

What Happens When We Treat the Weights as Stochastic?

Are There Alternatives to Shannon Information Measures?

Shownotes Transcript

“What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy