mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2024-12-18 14:10:28 +00:00
small readme clarification and training script defaults changes
This commit is contained in:
parent
fe8042867c
commit
c9fe00c0e9
@ -15,12 +15,14 @@ We need a few dependencies:
|
|||||||
- `pip install tiktoken` for OpenAI's fast bpe code
|
- `pip install tiktoken` for OpenAI's fast bpe code
|
||||||
- `pip install wandb` for optional logging
|
- `pip install wandb` for optional logging
|
||||||
|
|
||||||
|
Then we want to render the detaset:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ cd data/openwebtext
|
$ cd data/openwebtext
|
||||||
$ python prepare.py
|
$ python prepare.py
|
||||||
```
|
```
|
||||||
|
|
||||||
To download and tokenize the [openwebtext](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. First open up train.py and read it, make sure the settings look ok. Then:
|
To download and tokenize the [openwebtext](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. The training script currently tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can run it like so:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ python train.py
|
$ python train.py
|
||||||
|
8
train.py
8
train.py
@ -19,14 +19,14 @@ out_dir = 'out'
|
|||||||
eval_interval = 500
|
eval_interval = 500
|
||||||
log_interval = 1
|
log_interval = 1
|
||||||
# wandb logging
|
# wandb logging
|
||||||
wandb_log = False
|
wandb_log = False # disabled by default
|
||||||
wandb_entity = 'karpathy'
|
wandb_entity = 'karpathy'
|
||||||
wandb_project = 'owt'
|
wandb_project = 'owt'
|
||||||
wandb_run_name = 'owt1' # 'run' + str(time.time())
|
wandb_run_name = 'gpt2' # 'run' + str(time.time())
|
||||||
# data
|
# data
|
||||||
dataset = 'openwebtext'
|
dataset = 'openwebtext'
|
||||||
batch_size = 32
|
batch_size = 8
|
||||||
block_size = 512
|
block_size = 1024
|
||||||
# model
|
# model
|
||||||
device = 'cuda:0'
|
device = 'cuda:0'
|
||||||
init_from = 'scratch' # 'scratch' or 'resume' or 'gpt2*'
|
init_from = 'scratch' # 'scratch' or 'resume' or 'gpt2*'
|
||||||
|
Loading…
Reference in New Issue
Block a user