Andrej Karpathy
|
55c5069696
|
fix misinformation in readme
|
2023-02-10 16:34:46 +00:00 |
|
Andrej Karpathy
|
f83dd034e1
|
also add a sampling/inference section
|
2023-02-05 21:02:30 +00:00 |
|
Andrej Karpathy
|
23a8e701d2
|
revamp the readme file to be a bit better and more accessible, i hope
|
2023-02-05 19:31:32 +00:00 |
|
Andrej Karpathy
|
d2705bd92a
|
tune cited numbers and reproductions and more explicitly point out the problems w.r.t. the OWT vs WT domain gap
|
2023-01-31 21:57:07 +00:00 |
|
Andrej Karpathy
|
f29a9ff5bf
|
ok i tried bringing back original init again and this time it makes a ton of difference and works much better than default. i'm not sure what was different with my earlier experiment where i saw a slight regression. may try to dissect commits later, for now merged the original mingpt init (following gpt-2 paper) as default.
|
2023-01-27 17:56:18 +00:00 |
|
Andrej Karpathy
|
3cb3fc059c
|
grad clipping seems to slightly speed up training in the beginning but i can't see a big difference later in the training. it costs non-negligeable compute to clip. adding it for now because it is standard, and i think more necessary as the model becomes larger. practitioners may consider turning it off for minor efficiency gains
|
2023-01-27 16:45:09 +00:00 |
|
Andrej Karpathy
|
1f77d03024
|
make mentions of mps in docs. ty good people in issue #28
|
2023-01-20 21:28:20 +00:00 |
|
Andrej Karpathy
|
2b083fbfde
|
the badge is a bit ugly, move it down to troubleshooting section
|
2023-01-18 03:16:59 +00:00 |
|
Andrej Karpathy
|
aa8e4c2546
|
screwed up the link, fix
|
2023-01-18 03:11:31 +00:00 |
|
Andrej Karpathy
|
6dab32c003
|
experimenting with badges, and discord link to start specifically. issues sometimes feel a little too heavy
|
2023-01-18 03:09:42 +00:00 |
|
Andrej Karpathy
|
7f74652843
|
add docs on multinode training to main README too
|
2023-01-16 17:11:02 +00:00 |
|
Andrej Karpathy
|
32b4f08d9d
|
it's true
|
2023-01-13 23:43:00 +00:00 |
|
Andrej Karpathy
|
c1ac2d58f1
|
including transformers as a dependency of the repo as well
|
2023-01-12 02:42:38 +00:00 |
|
Andrej Karpathy
|
7f51d17977
|
add note about windows and pytorch 2.0 and torch compile in general
|
2023-01-12 02:17:52 +00:00 |
|
Andrej Karpathy
|
bb49751439
|
oh no nanoGPT is trending quickly explain the character-level functionality I added late last night
|
2023-01-11 17:11:15 +00:00 |
|
Andrej Karpathy
|
8b2e622b27
|
adjust the readme to reflect changes in the autocast branch
|
2023-01-08 19:40:46 +00:00 |
|
Luca Antiga
|
aba47f0a35
|
Make wandb import conditioned to wandb_log=True
|
2023-01-08 15:42:08 +01:00 |
|
Andrej Karpathy
|
e53b9d28ff
|
ran readme through spellchecker heh
|
2023-01-08 01:46:54 +00:00 |
|
Andrej Karpathy
|
df3b8a57ab
|
tune the readme with new header image and the loss curve for 124M
|
2023-01-08 00:41:14 +00:00 |
|
Andrej Karpathy
|
d562b3e550
|
shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks*
|
2023-01-05 00:44:35 +00:00 |
|
Andrej Karpathy
|
ab04701f9f
|
mention current 8GPU SOTA and shuffle sections a bit
|
2023-01-04 18:59:10 +00:00 |
|
Jonathan Rahn
|
26aa5f3ead
|
Update README.md
|
2023-01-04 10:28:13 +01:00 |
|
Andrej Karpathy
|
c72ecf5d93
|
add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more
|
2023-01-04 00:59:34 +00:00 |
|
Andrej Karpathy
|
5acba4b005
|
ty lambda labs
|
2023-01-03 21:16:07 +00:00 |
|
Andrej Karpathy
|
97fc42616e
|
adding few more dependencies
|
2023-01-03 17:54:48 +00:00 |
|
Andrej Karpathy
|
b45eec3e4b
|
flesh out the remaining TODOs in readme a bit more
|
2023-01-03 07:41:28 +00:00 |
|
Andrej Karpathy
|
2febf4463c
|
candidate changes to apis, have to think through more
|
2023-01-01 01:29:48 +00:00 |
|
Andrej Karpathy
|
7c6ea8409e
|
simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging
|
2022-12-30 22:18:20 +00:00 |
|
Andrej Karpathy
|
d8abd21258
|
typo fix in readme
|
2022-12-30 00:07:58 +00:00 |
|
Andrej Karpathy
|
5a725d9098
|
add torch.compile by default, shows almost 1.8X improvement in throughput nice
|
2022-12-30 00:07:13 +00:00 |
|
Andrej Karpathy
|
f88aa2c2fe
|
add link to mingpt
|
2022-12-29 17:38:33 +00:00 |
|
Andrej Karpathy
|
f2fc4be69b
|
mention 4gpu loss as well in readme
|
2022-12-29 17:26:42 +00:00 |
|
Andrej Karpathy
|
e7bac659f5
|
oops missed one # have to fix
|
2022-12-29 05:24:14 +00:00 |
|
Andrej Karpathy
|
97e2ab1b8d
|
enhance readme, add some todos
|
2022-12-29 05:23:36 +00:00 |
|
Andrej Karpathy
|
dea1507252
|
add support for DDP training. the scaling timings right now do not look good by default, have to dig more into
|
2022-12-29 05:06:07 +00:00 |
|
Andrej Karpathy
|
ee6459f1d0
|
readme tweaks
|
2022-12-29 02:00:25 +00:00 |
|
Andrej Karpathy
|
b760ef1358
|
add data loading into benchmarking as well, just for completeness
|
2022-12-29 00:05:32 +00:00 |
|
Andrej Karpathy
|
70b5d93aee
|
add benchmarking script v0
|
2022-12-28 23:55:43 +00:00 |
|
Andrej Karpathy
|
5d2b4807bf
|
adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm
|
2022-12-28 23:31:23 +00:00 |
|
Andrej Karpathy
|
c9fe00c0e9
|
small readme clarification and training script defaults changes
|
2022-12-28 01:45:55 +00:00 |
|
Andrej Karpathy
|
fe8042867c
|
first very bad commit
|
2022-12-28 00:58:19 +00:00 |
|