We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “The slingshot helps with learning” by Wilson Wu

“The slingshot helps with learning” by Wilson Wu

2024/11/1
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

Audio note: this article contains 76 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

The slingshot effect is a late-stage training anomaly found in various adaptive gradient optimization methods. In particular, slingshots are present with AdamW, the optimizer most widely used for modern transformer training. The original slingshot paper observes that slingshots tend to occur alongside grokking, a phenomenon in which neural networks trained on algorithmic tasks generalize to the test set long after perfectly fitting the training set. In this post we take a closer look at slingshots and their effect on generalization in the setting of 1-hidden-layer MLPs trained on <span>k</span>-sparse parity, a specific algorithmic task. The main results are 1) an explanation of why slingshots occur in models trained with hinge loss that partially transfers to models trained with [...]


Outline:

(01:09) 1 Introduction

(01:13) 1.1 The k-sparse parity task

(02:39) 1.2 The slingshot effect

(03:19) 2 Why do slingshots happen in the k-sparse parity setting?

(03:48) 2.1 Hinge loss

(08:41) 2.2 Cross-entropy loss

(10:07) 2.3 Resetting AdamW mitigates slingshots

(11:20) 3 Slingshots and generalization

(11:24) 3.1 Slingshots tend to decrease test loss

(11:57) 3.2 Slingshots vs random jumps

(13:59) 3.3 Slingshots and LLC

(15:48) 4 Discussion

The original text contained 7 footnotes which were omitted from this narration.


First published: October 31st, 2024

Source: https://www.lesswrong.com/posts/LncYobrn3vRr7qkZW/the-slingshot-helps-with-learning)

    ---
    

Narrated by TYPE III AUDIO).


Images from the article: undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined)undefined) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.