We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy

“What’s the Right Way to think about Information Theoretic quantities in Neural Networks?” by Dalcy

2025/1/19
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

Tl;dr, Neural networks are deterministic and sometimes even reversible, which causes Shannon information measures to degenerate. But information theory seems useful. How can we square this (if it's possible at all)? The attempts so far in the literature are unsatisfying. Here is a conceptual question: what is the Right Way to think about information theoretic quantities in neural network contexts? Example: I've been recently thinking about information bottleneck methods: given some data distribution <span>P(X, Y)</span>, it tries to find features <span>Z</span> specified by <span>P(Z|X)</span> that have nice properties like minimality (small <span>I(X;Z)</span>) and sufficiency (big <span>I(Z;Y)</span>). But as pointed out in the literature several times, the fact that neural networks implement a deterministic map makes these information theoretic quantities degenerate:

if Z is a deterministic map of X and they’re both continuous, then I(X;Z) is infinite. Binning / Quantizing them does turn Z into a stochastic function [...]


Outline:

(02:28) Treat the weight as stochastic:

(04:23) Use something other than shannon information measures:


First published: January 19th, 2025

Source: https://www.lesswrong.com/posts/E5EazNvQHiAKDxW3W/what-s-the-right-way-to-think-about-information-theoretic)

    ---
    

Narrated by TYPE III AUDIO).