mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2024-12-18 14:10:28 +00:00
mention current 8GPU SOTA and shuffle sections a bit
This commit is contained in:
parent
1eefbb2520
commit
ab04701f9f
22
README.md
22
README.md
@ -42,17 +42,7 @@ To my knowledge, running this with the current script with the GPT-2 hyperparame
|
|||||||
$ python sample.py
|
$ python sample.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Training on 1 A100 40GB GPU overnight currently gets loss ~3.74, training on 4 gets ~3.60. Random chance at init is -ln(1/50257) = 10.82. Which brings us to baselines:
|
Training on 1 A100 40GB GPU overnight currently gets loss ~3.74, training on 4 gets ~3.60. Training on an 8 x A100 40GB node for 400,000 iters (~1 day) atm gets down to 3.1. Random chance at init is -ln(1/50257) = 10.82. Which brings us to baselines.
|
||||||
|
|
||||||
## finetuning
|
|
||||||
|
|
||||||
For an example of how to finetune a GPT on new text go to `data/shakespeare` and look at `prepare.py` to download the tiny shakespeare dataset and render it into a `train.bin` and `val.bin`. Unlike OpenWebText this will run in seconds. Finetuning takes very little time, e.g. on a single GPU just a few minutes. Run an example finetuning like:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ python train.py finetune_shakespeare
|
|
||||||
```
|
|
||||||
|
|
||||||
This will load the config parameter overrides in `config/finetune_shakespeare.py` (I didn't tune them much though). Basically, we initialize from a GPT2 checkpoint with `init_from` and train as normal, except shorter and with a small learning rate. The best checkpoint (lowest validation loss) will be in the `out_dir` directory, e.g. in `out-shakespeare` by default, per the config file. You can then run the code in `sample.py` to generate infinite Shakespeare. Note that you'll have to edit it to point to the correct `out_dir`.
|
|
||||||
|
|
||||||
## baselines
|
## baselines
|
||||||
|
|
||||||
@ -76,6 +66,16 @@ and observe the following losses on train and val:
|
|||||||
|
|
||||||
I briefly tried finetuning gpt2 a bit more on our OWT and didn't notice dramatic improvements, suggesting that OWT is not much much different from WT in terms of the data distribution, but this needs a bit more thorough attempt once the code is in a better place.
|
I briefly tried finetuning gpt2 a bit more on our OWT and didn't notice dramatic improvements, suggesting that OWT is not much much different from WT in terms of the data distribution, but this needs a bit more thorough attempt once the code is in a better place.
|
||||||
|
|
||||||
|
## finetuning
|
||||||
|
|
||||||
|
For an example of how to finetune a GPT on new text go to `data/shakespeare` and look at `prepare.py` to download the tiny shakespeare dataset and render it into a `train.bin` and `val.bin`. Unlike OpenWebText this will run in seconds. Finetuning takes very little time, e.g. on a single GPU just a few minutes. Run an example finetuning like:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ python train.py finetune_shakespeare
|
||||||
|
```
|
||||||
|
|
||||||
|
This will load the config parameter overrides in `config/finetune_shakespeare.py` (I didn't tune them much though). Basically, we initialize from a GPT2 checkpoint with `init_from` and train as normal, except shorter and with a small learning rate. The best checkpoint (lowest validation loss) will be in the `out_dir` directory, e.g. in `out-shakespeare` by default, per the config file. You can then run the code in `sample.py` to generate infinite Shakespeare. Note that you'll have to edit it to point to the correct `out_dir`.
|
||||||
|
|
||||||
## benchmarking
|
## benchmarking
|
||||||
|
|
||||||
For model benchmarking `bench.py` might be useful. It's identical what happens in the meat of the training loop of `train.py`, but omits much of the other complexities.
|
For model benchmarking `bench.py` might be useful. It's identical what happens in the meat of the training loop of `train.py`, but omits much of the other complexities.
|
||||||
|
Loading…
Reference in New Issue
Block a user