1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00
Testing various LLM-related things.
Go to file
2022-12-28 01:45:55 +00:00
data/openwebtext first very bad commit 2022-12-28 00:58:19 +00:00
model.py first very bad commit 2022-12-28 00:58:19 +00:00
README.md small readme clarification and training script defaults changes 2022-12-28 01:45:55 +00:00
sample.py first very bad commit 2022-12-28 00:58:19 +00:00
train.py small readme clarification and training script defaults changes 2022-12-28 01:45:55 +00:00

nanoGPT

The cleanest, fastest repository for training/finetuning medium-sized GPTs.

This repo currently requires reading the code, but it's not that bad. work ongoing...

Getting started:

We need a few dependencies:

  • pytorch, of course
  • numpy
  • pip install datasets for huggingface datasets
  • pip install tiktoken for OpenAI's fast bpe code
  • pip install wandb for optional logging

Then we want to render the detaset:

$ cd data/openwebtext
$ python prepare.py

To download and tokenize the openwebtext dataset. It will create a train.bin and val.bin which holds the GPT2 BPE token ids in a massive sequence. Then we're ready to kick off training. The training script currently tries to reproduce the smallest GPT-2 released by OpenAI, i.e. the 124M version of GPT-2. We can run it like so:

$ python train.py

Once some checkpoints are written to the output directory out, we're ready to sample from the model:

$ python sample.py