1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00

Merge pull request #224 from SnehalRaj/patch-1

fix small typo
This commit is contained in:
Andrej 2023-04-12 22:12:26 -07:00 committed by GitHub
commit 8aeea6d970
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -37,7 +37,7 @@ This creates a `train.bin` and `val.bin` in that data directory. Now it is time
$ python train.py config/train_shakespeare_char.py $ python train.py config/train_shakespeare_char.py
``` ```
If you peak inside it, you'll see that we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697. Based on the configuration, the model checkpoints are being written into the `--out_dir` directory `out-shakespeare-char`. So once the training finishes we can sample from the best model by pointing the sampling script at this directory: If you peek inside it, you'll see that we're training a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697. Based on the configuration, the model checkpoints are being written into the `--out_dir` directory `out-shakespeare-char`. So once the training finishes we can sample from the best model by pointing the sampling script at this directory:
``` ```
$ python sample.py --out_dir=out-shakespeare-char $ python sample.py --out_dir=out-shakespeare-char