small readme clarification and training script defaults changes

2025-10-20 10:07:40 +00:00 · 2022-12-28 01:45:55 +00:00
parent fe8042867c
commit c9fe00c0e9
2 changed files with 7 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -15,12 +15,14 @@ We need a few dependencies:
 - `pip install tiktoken` for OpenAI's fast bpe code
 - `pip install wandb` for optional logging

+Then we want to render the detaset:
+
 ```
 $ cd data/openwebtext
 $ python prepare.py
 ```

-To download and tokenize the [openwebtext](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. First open up train.py and read it, make sure the settings look ok. Then:
+To download and tokenize the [openwebtext](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. The training script currently tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can run it like so:

 ```
 $ python train.py
--- a/train.py
+++ b/train.py
@@ -19,14 +19,14 @@ out_dir = 'out'
 eval_interval = 500
 log_interval = 1
 # wandb logging
-wandb_log = False
+wandb_log = False # disabled by default
 wandb_entity = 'karpathy'
 wandb_project = 'owt'
-wandb_run_name = 'owt1' # 'run' + str(time.time())
+wandb_run_name = 'gpt2' # 'run' + str(time.time())
 # data
 dataset = 'openwebtext'
-batch_size = 32
-block_size = 512
+batch_size = 8
+block_size = 1024
 # model
 device = 'cuda:0'
 init_from = 'scratch' # 'scratch' or 'resume' or 'gpt2*'