← Blog

Grokking Beyond the Euclidean Norm of Model Parameters

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. We show that the dynamic of grokking goes beyond the 2\ell_2 norm, that is: If there exists a model with a property PP (e.g., sparse or low-rank weights) that fits the data, then GD with a small (explicit or implicit) regularization of PP (e.g., 1\ell_1 or nuclear norm regularization) will also result in grokking, provided the number of training samples is large enough. Moreover, the 2\ell_2 norm of the parameters is no longer guaranteed to decrease with generalization when it is not the property sought.

Paper : Grokking Beyond the Euclidean Norm of Model Parameters, Pascal Jr. Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau, Forty-Second International Conference on Machine Learning (ICML), 2025. https://arxiv.org/abs/2506.05718

Blog post : https://hackmd.io/@6LQ4mvRtS4Sc3LHkNEvDXQ/BytCby2Ugl