1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00
Commit Graph

22 Commits

Author SHA1 Message Date
Andrej Karpathy
d562b3e550 shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks* 2023-01-05 00:44:35 +00:00
Andrej Karpathy
ab04701f9f mention current 8GPU SOTA and shuffle sections a bit 2023-01-04 18:59:10 +00:00
Jonathan Rahn
26aa5f3ead
Update README.md 2023-01-04 10:28:13 +01:00
Andrej Karpathy
c72ecf5d93 add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more 2023-01-04 00:59:34 +00:00
Andrej Karpathy
5acba4b005 ty lambda labs 2023-01-03 21:16:07 +00:00
Andrej Karpathy
97fc42616e adding few more dependencies 2023-01-03 17:54:48 +00:00
Andrej Karpathy
b45eec3e4b flesh out the remaining TODOs in readme a bit more 2023-01-03 07:41:28 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
Andrej Karpathy
7c6ea8409e simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging 2022-12-30 22:18:20 +00:00
Andrej Karpathy
d8abd21258 typo fix in readme 2022-12-30 00:07:58 +00:00
Andrej Karpathy
5a725d9098 add torch.compile by default, shows almost 1.8X improvement in throughput nice 2022-12-30 00:07:13 +00:00
Andrej Karpathy
f88aa2c2fe add link to mingpt 2022-12-29 17:38:33 +00:00
Andrej Karpathy
f2fc4be69b mention 4gpu loss as well in readme 2022-12-29 17:26:42 +00:00
Andrej Karpathy
e7bac659f5 oops missed one # have to fix 2022-12-29 05:24:14 +00:00
Andrej Karpathy
97e2ab1b8d enhance readme, add some todos 2022-12-29 05:23:36 +00:00
Andrej Karpathy
dea1507252 add support for DDP training. the scaling timings right now do not look good by default, have to dig more into 2022-12-29 05:06:07 +00:00
Andrej Karpathy
ee6459f1d0 readme tweaks 2022-12-29 02:00:25 +00:00
Andrej Karpathy
b760ef1358 add data loading into benchmarking as well, just for completeness 2022-12-29 00:05:32 +00:00
Andrej Karpathy
70b5d93aee add benchmarking script v0 2022-12-28 23:55:43 +00:00
Andrej Karpathy
5d2b4807bf adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm 2022-12-28 23:31:23 +00:00
Andrej Karpathy
c9fe00c0e9 small readme clarification and training script defaults changes 2022-12-28 01:45:55 +00:00
Andrej Karpathy
fe8042867c first very bad commit 2022-12-28 00:58:19 +00:00