mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2025-01-05 15:00:28 +00:00
Merge pull request #247 from gnobre/macbook-run-instructions
Macbook run instructions
This commit is contained in:
commit
196160b849
@ -76,6 +76,11 @@ $ python train.py config/train_shakespeare_char.py --device=cpu --compile=False
|
|||||||
|
|
||||||
Here, since we are running on CPU instead of GPU we must set both `--device=cpu` and also turn off PyTorch 2.0 compile with `--compile=False`. Then when we evaluate we get a bit more noisy but faster estimate (`--eval_iters=20`, down from 200), our context size is only 64 characters instead of 256, and the batch size only 12 examples per iteration, not 64. We'll also use a much smaller Transformer (4 layers, 4 heads, 128 embedding size), and decrease the number of iterations to 2000 (and correspondingly usually decay the learning rate to around max_iters with `--lr_decay_iters`). Because our network is so small we also ease down on regularization (`--dropout=0.0`). This still runs in about ~3 minutes, but gets us a loss of only 1.88 and therefore also worse samples, but it's still good fun:
|
Here, since we are running on CPU instead of GPU we must set both `--device=cpu` and also turn off PyTorch 2.0 compile with `--compile=False`. Then when we evaluate we get a bit more noisy but faster estimate (`--eval_iters=20`, down from 200), our context size is only 64 characters instead of 256, and the batch size only 12 examples per iteration, not 64. We'll also use a much smaller Transformer (4 layers, 4 heads, 128 embedding size), and decrease the number of iterations to 2000 (and correspondingly usually decay the learning rate to around max_iters with `--lr_decay_iters`). Because our network is so small we also ease down on regularization (`--dropout=0.0`). This still runs in about ~3 minutes, but gets us a loss of only 1.88 and therefore also worse samples, but it's still good fun:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ python sample.py --out_dir=out-shakespeare-char --device=cpu
|
||||||
|
```
|
||||||
|
Generates samples like this:
|
||||||
|
|
||||||
```
|
```
|
||||||
GLEORKEN VINGHARD III:
|
GLEORKEN VINGHARD III:
|
||||||
Whell's the couse, the came light gacks,
|
Whell's the couse, the came light gacks,
|
||||||
@ -86,7 +91,7 @@ No relving thee post mose the wear
|
|||||||
|
|
||||||
Not bad for ~3 minutes on a CPU, for a hint of the right character gestalt. If you're willing to wait longer, feel free to tune the hyperparameters, increase the size of the network, the context length (`--block_size`), the length of training, etc.
|
Not bad for ~3 minutes on a CPU, for a hint of the right character gestalt. If you're willing to wait longer, feel free to tune the hyperparameters, increase the size of the network, the context length (`--block_size`), the length of training, etc.
|
||||||
|
|
||||||
Finally, on Apple Silicon Macbooks and with a recent PyTorch version make sure to add `--device mps` (short for "Metal Performance Shaders"); PyTorch then uses the on-chip GPU that can *significantly* accelerate training (2-3X) and allow you to use larger networks. See [Issue 28](https://github.com/karpathy/nanoGPT/issues/28) for more.
|
Finally, on Apple Silicon Macbooks and with a recent PyTorch version make sure to add `--device=mps` (short for "Metal Performance Shaders"); PyTorch then uses the on-chip GPU that can *significantly* accelerate training (2-3X) and allow you to use larger networks. See [Issue 28](https://github.com/karpathy/nanoGPT/issues/28) for more.
|
||||||
|
|
||||||
## reproducing GPT-2
|
## reproducing GPT-2
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user