mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2025-09-02 02:47:58 +00:00
add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.
This commit is contained in:
9
data/shakespeare_char/readme.md
Normal file
9
data/shakespeare_char/readme.md
Normal file
@@ -0,0 +1,9 @@
|
||||
|
||||
# tiny shakespeare, character-level
|
||||
|
||||
Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.
|
||||
|
||||
After running `prepare.py`:
|
||||
|
||||
- train.bin has 1,003,854 tokens
|
||||
- val.bin has 111,540 tokens
|
Reference in New Issue
Block a user