Blog

Notes on machine learning, mathematics, and research.

July 6, 2025
Grokking Beyond the Euclidean Norm of Model Parameters
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods.
Deep LearningGrokkingDelayed GeneralizationRegularizationSparsityLow-RankOverparameterizationGradient DescentImplicit Regularization
May 1, 2023
Epoch-wise bias-variance decomposition
Let's suppose we're training a model parameterized by theta, and let's denote by theta_t the parameter at step t.
deep learningstatistical learningbias-variance tradeoff
May 1, 2022
Visualization of the loss landscape and optimization path of a neural network
While neural loss functions live in a very high-dimensional space, visualizations are only possible using low-dimensional plots.
deep learningloss landscape
August 7, 2020
Word embeddings
The genesis of my word embeddings tutorial, and what led me to machine learning research.
NLPGloveWord2VecBag of wordsTF-IDF