Andrej Karpathy
|
9f95aca93e
|
better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas
|
2023-01-03 17:45:49 +00:00 |
|
Andrej Karpathy
|
ec9b1f8182
|
add a patch to fix mysterious unwanted prefix in state dict? maybe remove later
|
2023-01-02 01:25:02 +00:00 |
|
Andrej Karpathy
|
35f51974c4
|
rename to compile it's shorter
|
2023-01-02 01:14:46 +00:00 |
|
Andrej Karpathy
|
2febf4463c
|
candidate changes to apis, have to think through more
|
2023-01-01 01:29:48 +00:00 |
|
Andrej Karpathy
|
5a725d9098
|
add torch.compile by default, shows almost 1.8X improvement in throughput nice
|
2022-12-30 00:07:13 +00:00 |
|
Andrej Karpathy
|
682a0ac8f1
|
properly resume training, also loading iter_num and best_val_loss from checkpoints
|
2022-12-29 18:23:15 +00:00 |
|
Andrej Karpathy
|
dea1507252
|
add support for DDP training. the scaling timings right now do not look good by default, have to dig more into
|
2022-12-29 05:06:07 +00:00 |
|
Andrej Karpathy
|
5d2b4807bf
|
adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm
|
2022-12-28 23:31:23 +00:00 |
|
Andrej Karpathy
|
c9fe00c0e9
|
small readme clarification and training script defaults changes
|
2022-12-28 01:45:55 +00:00 |
|
Andrej Karpathy
|
fe8042867c
|
first very bad commit
|
2022-12-28 00:58:19 +00:00 |
|