1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00
Commit Graph

38 Commits

Author SHA1 Message Date
Andrej Karpathy
d2705bd92a tune cited numbers and reproductions and more explicitly point out the problems w.r.t. the OWT vs WT domain gap 2023-01-31 21:57:07 +00:00
Andrej Karpathy
f29a9ff5bf ok i tried bringing back original init again and this time it makes a ton of difference and works much better than default. i'm not sure what was different with my earlier experiment where i saw a slight regression. may try to dissect commits later, for now merged the original mingpt init (following gpt-2 paper) as default. 2023-01-27 17:56:18 +00:00
Andrej Karpathy
3cb3fc059c grad clipping seems to slightly speed up training in the beginning but i can't see a big difference later in the training. it costs non-negligeable compute to clip. adding it for now because it is standard, and i think more necessary as the model becomes larger. practitioners may consider turning it off for minor efficiency gains 2023-01-27 16:45:09 +00:00
Andrej Karpathy
1f77d03024 make mentions of mps in docs. ty good people in issue #28 2023-01-20 21:28:20 +00:00
Andrej Karpathy
2b083fbfde the badge is a bit ugly, move it down to troubleshooting section 2023-01-18 03:16:59 +00:00
Andrej Karpathy
aa8e4c2546 screwed up the link, fix 2023-01-18 03:11:31 +00:00
Andrej Karpathy
6dab32c003 experimenting with badges, and discord link to start specifically. issues sometimes feel a little too heavy 2023-01-18 03:09:42 +00:00
Andrej Karpathy
7f74652843 add docs on multinode training to main README too 2023-01-16 17:11:02 +00:00
Andrej Karpathy
32b4f08d9d it's true 2023-01-13 23:43:00 +00:00
Andrej Karpathy
c1ac2d58f1 including transformers as a dependency of the repo as well 2023-01-12 02:42:38 +00:00
Andrej Karpathy
7f51d17977 add note about windows and pytorch 2.0 and torch compile in general 2023-01-12 02:17:52 +00:00
Andrej Karpathy
bb49751439 oh no nanoGPT is trending quickly explain the character-level functionality I added late last night 2023-01-11 17:11:15 +00:00
Andrej Karpathy
8b2e622b27 adjust the readme to reflect changes in the autocast branch 2023-01-08 19:40:46 +00:00
Luca Antiga
aba47f0a35 Make wandb import conditioned to wandb_log=True 2023-01-08 15:42:08 +01:00
Andrej Karpathy
e53b9d28ff ran readme through spellchecker heh 2023-01-08 01:46:54 +00:00
Andrej Karpathy
df3b8a57ab tune the readme with new header image and the loss curve for 124M 2023-01-08 00:41:14 +00:00
Andrej Karpathy
d562b3e550 shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks* 2023-01-05 00:44:35 +00:00
Andrej Karpathy
ab04701f9f mention current 8GPU SOTA and shuffle sections a bit 2023-01-04 18:59:10 +00:00
Jonathan Rahn
26aa5f3ead
Update README.md 2023-01-04 10:28:13 +01:00
Andrej Karpathy
c72ecf5d93 add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more 2023-01-04 00:59:34 +00:00
Andrej Karpathy
5acba4b005 ty lambda labs 2023-01-03 21:16:07 +00:00
Andrej Karpathy
97fc42616e adding few more dependencies 2023-01-03 17:54:48 +00:00
Andrej Karpathy
b45eec3e4b flesh out the remaining TODOs in readme a bit more 2023-01-03 07:41:28 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
Andrej Karpathy
7c6ea8409e simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging 2022-12-30 22:18:20 +00:00
Andrej Karpathy
d8abd21258 typo fix in readme 2022-12-30 00:07:58 +00:00
Andrej Karpathy
5a725d9098 add torch.compile by default, shows almost 1.8X improvement in throughput nice 2022-12-30 00:07:13 +00:00
Andrej Karpathy
f88aa2c2fe add link to mingpt 2022-12-29 17:38:33 +00:00
Andrej Karpathy
f2fc4be69b mention 4gpu loss as well in readme 2022-12-29 17:26:42 +00:00
Andrej Karpathy
e7bac659f5 oops missed one # have to fix 2022-12-29 05:24:14 +00:00
Andrej Karpathy
97e2ab1b8d enhance readme, add some todos 2022-12-29 05:23:36 +00:00
Andrej Karpathy
dea1507252 add support for DDP training. the scaling timings right now do not look good by default, have to dig more into 2022-12-29 05:06:07 +00:00
Andrej Karpathy
ee6459f1d0 readme tweaks 2022-12-29 02:00:25 +00:00
Andrej Karpathy
b760ef1358 add data loading into benchmarking as well, just for completeness 2022-12-29 00:05:32 +00:00
Andrej Karpathy
70b5d93aee add benchmarking script v0 2022-12-28 23:55:43 +00:00
Andrej Karpathy
5d2b4807bf adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm 2022-12-28 23:31:23 +00:00
Andrej Karpathy
c9fe00c0e9 small readme clarification and training script defaults changes 2022-12-28 01:45:55 +00:00
Andrej Karpathy
fe8042867c first very bad commit 2022-12-28 00:58:19 +00:00