mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2024-12-18 06:00:29 +00:00
Testing various LLM-related things.
data/openwebtext | ||
model.py | ||
README.md | ||
sample.py | ||
train.py |
nanoGPT
The cleanest, fastest repository for training/finetuning medium-sized GPTs.
This repo currently requires reading the code, but it's not that bad. work ongoing...
Getting started:
We need a few dependencies:
- pytorch, of course
- numpy
pip install datasets
for huggingface datasetspip install tiktoken
for OpenAI's fast bpe codepip install wandb
for optional logging
Then we want to render the detaset:
$ cd data/openwebtext
$ python prepare.py
To download and tokenize the openwebtext dataset. It will create a train.bin
and val.bin
which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. The training script currently tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can run it like so:
$ python train.py
Once some checkpoints are written to the output directory out
, we're ready to sample from the model:
$ python sample.py