Grokking Beyond the Euclidean Norm of Model Parameters
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. We show that the dynamic of grokking goes beyond the norm, that is: If there exists a model with a property (e.g., sparse or low-rank weights) that fits the data, then GD with a small (explicit or implicit) regularization of (e.g., or nuclear norm regularization) will also result in grokking, provided the number of training samples is large enough. Moreover, the norm of the parameters is no longer guaranteed to decrease with generalization when it is not the property sought.
Paper : Grokking Beyond the Euclidean Norm of Model Parameters, Pascal Jr. Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau, Forty-Second International Conference on Machine Learning (ICML), 2025. https://arxiv.org/abs/2506.05718
Blog post : https://hackmd.io/@6LQ4mvRtS4Sc3LHkNEvDXQ/BytCby2Ugl