mirror of
https://github.com/osmarks/nanogpt-experiments.git
synced 2024-11-10 20:09:58 +00:00
be571fff2c
Before: length of dataset in characters: 1115394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1003854 tokens val has 111540 tokens After: length of dataset in characters: 1,115,394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1,003,854 tokens val has 111,540 tokens |
||
---|---|---|
.. | ||
prepare.py | ||
readme.md |
tiny shakespeare, character-level
Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.
After running prepare.py
:
- train.bin has 1,003,854 tokens
- val.bin has 111,540 tokens