Writing
Blog
Notes on machine learning, mathematics, and research.
July 6, 2025
Grokking Beyond the Euclidean Norm of Model Parameters
Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods.
May 1, 2023
Epoch-wise bias-variance decomposition
Let's suppose we're training a model parameterized by theta, and let's denote by theta_t the parameter at step t.
May 1, 2022
Visualization of the loss landscape and optimization path of a neural network
While neural loss functions live in a very high-dimensional space, visualizations are only possible using low-dimensional plots.
August 7, 2020
Word embeddings
The genesis of my word embeddings tutorial, and what led me to machine learning research.