1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-09-21 11:49:46 +00:00
Commit Graph

29 Commits

Author SHA1 Message Date
Andrej Karpathy
3cb3fc059c grad clipping seems to slightly speed up training in the beginning but i can't see a big difference later in the training. it costs non-negligeable compute to clip. adding it for now because it is standard, and i think more necessary as the model becomes larger. practitioners may consider turning it off for minor efficiency gains 2023-01-27 16:45:09 +00:00
Andrej
3611338959
Merge pull request #71 from cchan/patch-1
Zero-grad more aggressively to save memory
2023-01-20 14:38:10 -08:00
Andrej Karpathy
1f77d03024 make mentions of mps in docs. ty good people in issue #28 2023-01-20 21:28:20 +00:00
Clive Chan
67166079c9
Zero-grad more aggressively to save memory 2023-01-19 22:10:44 -08:00
Andrej Karpathy
46ce9971df small tweaks to docs and variable names stylistically 2023-01-16 16:56:05 +00:00
Andrej Karpathy
684800dd87 clarify that these should be run on two separate machines 2023-01-16 06:02:46 +00:00
Andrej Karpathy
9352df23de docs for multinode ddp 2023-01-16 05:57:33 +00:00
Andrej Karpathy
c3dddbff3d get rid of gpu_id, the world is more complicated than that when world_size > 8 2023-01-16 05:44:50 +00:00
Andrej Karpathy
f5e6ac8b02 local rank -> rank 2023-01-16 05:13:13 +00:00
Andrej Karpathy
cf99914886 add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR 2023-01-15 17:49:55 +00:00
Andrej Karpathy
57735f532d correctly propagate the vocab_size from the rendered dataset into the model args 2023-01-14 02:26:44 +00:00
Andrej Karpathy
8f85b83347 inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token 2023-01-12 06:02:50 +00:00
Andrej Karpathy
d17350a31d add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders. 2023-01-11 05:27:19 +00:00
Andrej Karpathy
c2a402f7f7 guess the config from globals() and log all of it with wandb 2023-01-11 01:00:22 +00:00
Andrej Karpathy
a855d316fd add device and dtype support to train.py args 2023-01-08 19:20:38 +00:00
Luca Antiga
09f1f458e8 Move conditional import 2023-01-08 15:51:50 +01:00
Luca Antiga
aba47f0a35 Make wandb import conditioned to wandb_log=True 2023-01-08 15:42:08 +01:00
Andrej Karpathy
9629093e53 minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle 2023-01-05 01:14:02 +00:00
Andrej Karpathy
d562b3e550 shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks* 2023-01-05 00:44:35 +00:00
Andrej Karpathy
9f95aca93e better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas 2023-01-03 17:45:49 +00:00
Andrej Karpathy
ec9b1f8182 add a patch to fix mysterious unwanted prefix in state dict? maybe remove later 2023-01-02 01:25:02 +00:00
Andrej Karpathy
35f51974c4 rename to compile it's shorter 2023-01-02 01:14:46 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
Andrej Karpathy
5a725d9098 add torch.compile by default, shows almost 1.8X improvement in throughput nice 2022-12-30 00:07:13 +00:00
Andrej Karpathy
682a0ac8f1 properly resume training, also loading iter_num and best_val_loss from checkpoints 2022-12-29 18:23:15 +00:00
Andrej Karpathy
dea1507252 add support for DDP training. the scaling timings right now do not look good by default, have to dig more into 2022-12-29 05:06:07 +00:00
Andrej Karpathy
5d2b4807bf adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm 2022-12-28 23:31:23 +00:00
Andrej Karpathy
c9fe00c0e9 small readme clarification and training script defaults changes 2022-12-28 01:45:55 +00:00
Andrej Karpathy
fe8042867c first very bad commit 2022-12-28 00:58:19 +00:00