1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 06:00:29 +00:00
Commit Graph

10 Commits

Author SHA1 Message Date
Otavio Good
978d4fe538 Fix for gradient_accumulation_steps training slow 2023-03-25 00:04:45 -07:00
Andrej Karpathy
fce706cbe6 tune the hyperparams a bit, in configs 2023-02-05 19:31:18 +00:00
Andrej Karpathy
3969860ff5 include launch command too. anyone should be able to do this now 2023-02-03 22:17:05 +00:00
Andrej Karpathy
f9348f3f18 add gpt2 training config 2023-02-03 22:14:37 +00:00
Andrej Karpathy
7d7ded25ce a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think 2023-01-14 03:59:53 +00:00
Andrej Karpathy
e21cbf887f meant to set always_save_checkpoint to False instead, so we only write when val improves 2023-01-12 05:47:34 +00:00
Andrej Karpathy
d17350a31d add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders. 2023-01-11 05:27:19 +00:00
Andrej Karpathy
41184a27f5 rename compile_model to compile, shroter, version 2 stragglers 2023-01-02 01:15:55 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
Andrej Karpathy
5d2b4807bf adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm 2022-12-28 23:31:23 +00:00