diff --git a/transformer_sizing.ipynb b/transformer_sizing.ipynb index 5fb60f8..53791ae 100644 --- a/transformer_sizing.ipynb +++ b/transformer_sizing.ipynb @@ -358,7 +358,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is not a bad estimate at all. I trained this model and it converged in roughly 4 days." + "This is not a bad estimate at all. I trained this model and it converged in roughly 4 days. Btw as a good reference for where 6ND comes from and some intuition around it I recommend [Dzmitry's post](https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-language-model-training-3b19c1f025e4)." ] }, {