We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

2025/2/8
logo of podcast Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript
People
R
Randall Balestriero
Topics
Randall Balestriero: 在深度网络的训练过程中,即使训练指标已经达到稳定状态,持续进行训练仍然可以提高测试指标。这是因为网络会不断调整其权重,以便更好地泛化到测试样本。Grokking现象通常出现在特定的任务、模型和权重初始化等特定设置中。然而,在更一般的设置中,比如在计算机视觉任务中,对抗噪声下的测试精度提升通常出现在干净数据训练和测试精度稳定之后。这意味着,即使没有进行对抗训练,通过长时间的训练,网络也会自然涌现出对抗鲁棒性。这种鲁棒性的涌现是长时间训练和稀疏解出现的结果。具有分段线性非线性的神经网络可以将输入空间切割成线性区域,并将每个区域线性映射到输出,类似于弹性折纸。神经网络就像弹性折纸,将输入空间扭曲,然后用直线切割,以分离类别。神经网络对样本的切割密度不同,切割密集区域对对抗攻击更敏感。通过分析局部区域的分割统计信息,可以评估模型对不同样本的行为差异,从而评估模型对某些样本的算法偏差。相比于用一句话概括神经网络,理解神经网络在数据流形不同区域的行为更有助于全面理解神经网络。 Randall Balestriero: 在样条函数逼近中,选择空间分割和每个区域多项式的阶数至关重要。相比于增加多项式的阶数,更好地定位分割区域能获得更好的逼近效果,即使保持分段仿射。神经网络通过训练参数来适应分割,学习分割的同时也学习了仿射映射。神经网络的区域集中在训练点附近,较小区域提供更精确的逼近。

Deep Dive

Chapters
This chapter introduces the spline theory of neural networks, explaining how neural networks partition input space into linear regions and perform affine mappings within each region. It compares this to methods like k-means clustering and highlights the adaptive nature of deep network partitions, enabling superior extrapolation.
  • Neural networks partition input space into linear regions.
  • Within each region, the network performs an affine mapping.
  • The partition adapts during training, improving extrapolation.

Shownotes Transcript

Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.

SPONSOR MESSAGES:


CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/


Randall Balestriero

https://x.com/randall_balestr

https://randallbalestriero.github.io/

Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0

TOC:

  • Introduction

    • 00:00:00: Introduction
  • Neural Network Geometry and Spline Theory

    • 00:01:41: Neural Network Geometry and Spline Theory

    • 00:07:41: Deep Networks Always Grok

    • 00:11:39: Grokking and Adversarial Robustness

    • 00:16:09: Double Descent and Catastrophic Forgetting

  • Reconstruction Learning

    • 00:18:49: Reconstruction Learning

    • 00:24:15: Frequency Bias in Neural Networks

  • Geometric Analysis of Neural Networks

    • 00:29:02: Geometric Analysis of Neural Networks

    • 00:34:41: Adversarial Examples and Region Concentration

  • LLM Safety and Geometric Analysis

    • 00:40:05: LLM Safety and Geometric Analysis

    • 00:46:11: Toxicity Detection in LLMs

    • 00:52:24: Intrinsic Dimensionality and Model Control

    • 00:58:07: RLHF and High-Dimensional Spaces

  • Conclusion

    • 01:02:13: Neural Tangent Kernel

    • 01:08:07: Conclusion

REFS:

[00:01:35] Humayun – Deep network geometry & input space partitioning

https://arxiv.org/html/2408.04809v1

[00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

[00:13:55] Song et al. – Gradient-based white-box adversarial attacks

https://arxiv.org/abs/2012.14965

[00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness

https://arxiv.org/abs/2402.15555

[00:18:25] Humayun – Training dynamics & double descent via linear region evolution

https://arxiv.org/abs/2310.12977

[00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries

https://arxiv.org/abs/1905.08443

[00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning

https://arxiv.org/abs/1803.03635

[00:24:00] Belkin et al. – Double descent phenomenon in modern ML

https://arxiv.org/abs/1812.11118

[00:25:55] Balestriero et al. – Batch normalization’s regularization effects

https://arxiv.org/pdf/2209.14778

[00:29:35] EU – EU AI Act 2024 with compute restrictions

https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf

[00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry

https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf

[00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy

https://arxiv.org/pdf/2407.20099

[00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods

https://openreview.net/forum?id=ez7w0Ss4g9

(truncated, see shownotes PDF)