Writing

Blog

Notes on machine learning, mathematics, and research.

Grokking Beyond the Euclidean Norm of Model Parameters

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods.

Deep LearningGrokkingDelayed GeneralizationRegularizationSparsityLow-RankOverparameterizationGradient DescentImplicit Regularization

Epoch-wise bias-variance decomposition

Let's suppose we're training a model parameterized by theta, and let's denote by theta_t the parameter at step t.

deep learningstatistical learningbias-variance tradeoff

Visualization of the loss landscape and optimization path of a neural network

While neural loss functions live in a very high-dimensional space, visualizations are only possible using low-dimensional plots.

deep learningloss landscape

Word embeddings

The genesis of my word embeddings tutorial, and what led me to machine learning research.

NLPGloveWord2VecBag of wordsTF-IDF