1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00

readme tweaks

This commit is contained in:
Andrej Karpathy 2022-12-29 02:00:25 +00:00
parent 3000cf5dda
commit ee6459f1d0

View File

@ -1,11 +1,9 @@
# nanoGPT # nanoGPT
The cleanest, fastest repository for training/finetuning medium-sized GPTs. The cleanest, fastest repository for training/finetuning medium-sized GPTs. Still under active development, currently trying to reproduce GPT-2 on OpenWebText dataset. The code itself is tiny, plain and readable. At the moment `train.py` is a ~200-line boilerplate training loop and `model.py` a ~300-line GPT model definition, which can also load the GPT-2 weights from OpenAI.
This repo currently requires reading the code, but it's not that bad. work ongoing... ## install
Getting started:
We need a few dependencies: We need a few dependencies:
@ -15,20 +13,22 @@ We need a few dependencies:
- `pip install tiktoken` for OpenAI's fast bpe code - `pip install tiktoken` for OpenAI's fast bpe code
- `pip install wandb` for optional logging - `pip install wandb` for optional logging
Then we want to render the detaset: ## usage
To render a dataset we first tokenize some documents into one giant array of indices. E.g. for OpenWebText see:
``` ```
$ cd data/openwebtext $ cd data/openwebtext
$ python prepare.py $ python prepare.py
``` ```
To download and tokenize the [openwebtext](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. The training script currently tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can run it like so: To download and tokenize the [OpenWebText](https://huggingface.co/datasets/openwebtext) dataset. It will create a `train.bin` and `val.bin` which holds the GPT2 BPE token ids in one sequence, stored as raw uint16 bytes. Then we're ready to kick off training. The training script currently by default tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can train as follows, though I encourage you to read the code and see all of the settings and paths up top in the file:
``` ```
$ python train.py $ python train.py
``` ```
Once some checkpoints are written to the output directory `out`, we're ready to sample from the model: Once some checkpoints are written to the output directory `out`, we can sample from the model:
``` ```
$ python sample.py $ python sample.py