diff --git a/README.md b/README.md index 82f313a..2ba86fd 100644 --- a/README.md +++ b/README.md @@ -157,7 +157,6 @@ Features / APIs Suspiciousness -- Current initialization (PyTorch default) departs from GPT-2. In a very quick experiment I found it to be superior to the one suggested in the papers, but that can't be right? - I am still not 100% confident that my GPT-2 small reproduction hyperparameters are good, if someone has reproduced GPT-2 I'd be eager to exchange notes ty - I keep seeing different values cited for weight decay and AdamW betas, look into - I can't exactly reproduce Chinchilla paper results, see [scaling_laws.ipynb](scaling_laws.ipynb) notebook