From 0bb96d3fff31046eea294c30d73aedafe0b23fd3 Mon Sep 17 00:00:00 2001 From: Andrej Karpathy Date: Sat, 4 Feb 2023 22:07:32 +0000 Subject: [PATCH] add reference for 6ND to notebook too --- transformer_sizing.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/transformer_sizing.ipynb b/transformer_sizing.ipynb index 5fb60f8..53791ae 100644 --- a/transformer_sizing.ipynb +++ b/transformer_sizing.ipynb @@ -358,7 +358,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is not a bad estimate at all. I trained this model and it converged in roughly 4 days." + "This is not a bad estimate at all. I trained this model and it converged in roughly 4 days. Btw as a good reference for where 6ND comes from and some intuition around it I recommend [Dzmitry's post](https://medium.com/@dzmitrybahdanau/the-flops-calculus-of-language-model-training-3b19c1f025e4)." ] }, {