Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Jr. Tikeng Notsawo, Hattie Zhou, Mohammad Pezeshki, Irina Rish, Guillaume Dumas, ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.