Lewis
eeac8732b9
docs: simplify dependencies installation
...
Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.
2023-05-31 23:04:08 -05:00
Guilherme
4732c43af3
add macbook specific instructions to generate samples
2023-04-15 09:49:38 +01:00
Andrej
7840a66859
Merge pull request #54 from MicroPanda123/luv
...
Give tqdm some love :)
2023-04-12 22:25:18 -07:00
Andrej
8abe215fba
Merge pull request #128 from abrahamsangha/fix-typo
...
fix typo
2023-04-12 22:24:41 -07:00
Snehal Raj
c58fc4605c
fix small typo
2023-03-25 20:36:46 +01:00
Andrej Karpathy
55c5069696
fix misinformation in readme
2023-02-10 16:34:46 +00:00
Abraham Sangha
27a5d6f123
fix typos
2023-02-07 11:02:20 -07:00
Andrej Karpathy
f83dd034e1
also add a sampling/inference section
2023-02-05 21:02:30 +00:00
Andrej Karpathy
23a8e701d2
revamp the readme file to be a bit better and more accessible, i hope
2023-02-05 19:31:32 +00:00
Andrej Karpathy
d2705bd92a
tune cited numbers and reproductions and more explicitly point out the problems w.r.t. the OWT vs WT domain gap
2023-01-31 21:57:07 +00:00
Andrej Karpathy
f29a9ff5bf
ok i tried bringing back original init again and this time it makes a ton of difference and works much better than default. i'm not sure what was different with my earlier experiment where i saw a slight regression. may try to dissect commits later, for now merged the original mingpt init (following gpt-2 paper) as default.
2023-01-27 17:56:18 +00:00
Andrej Karpathy
3cb3fc059c
grad clipping seems to slightly speed up training in the beginning but i can't see a big difference later in the training. it costs non-negligeable compute to clip. adding it for now because it is standard, and i think more necessary as the model becomes larger. practitioners may consider turning it off for minor efficiency gains
2023-01-27 16:45:09 +00:00
Andrej Karpathy
1f77d03024
make mentions of mps in docs. ty good people in issue #28
2023-01-20 21:28:20 +00:00
Andrej Karpathy
2b083fbfde
the badge is a bit ugly, move it down to troubleshooting section
2023-01-18 03:16:59 +00:00
Andrej Karpathy
aa8e4c2546
screwed up the link, fix
2023-01-18 03:11:31 +00:00
Andrej Karpathy
6dab32c003
experimenting with badges, and discord link to start specifically. issues sometimes feel a little too heavy
2023-01-18 03:09:42 +00:00
Andrej Karpathy
7f74652843
add docs on multinode training to main README too
2023-01-16 17:11:02 +00:00
MicroPanda123
d5ee965974
Update README.md
2023-01-15 20:29:15 +00:00
Andrej Karpathy
32b4f08d9d
it's true
2023-01-13 23:43:00 +00:00
Andrej Karpathy
c1ac2d58f1
including transformers as a dependency of the repo as well
2023-01-12 02:42:38 +00:00
Andrej Karpathy
7f51d17977
add note about windows and pytorch 2.0 and torch compile in general
2023-01-12 02:17:52 +00:00
Andrej Karpathy
bb49751439
oh no nanoGPT is trending quickly explain the character-level functionality I added late last night
2023-01-11 17:11:15 +00:00
Andrej Karpathy
8b2e622b27
adjust the readme to reflect changes in the autocast branch
2023-01-08 19:40:46 +00:00
Luca Antiga
aba47f0a35
Make wandb import conditioned to wandb_log=True
2023-01-08 15:42:08 +01:00
Andrej Karpathy
e53b9d28ff
ran readme through spellchecker heh
2023-01-08 01:46:54 +00:00
Andrej Karpathy
df3b8a57ab
tune the readme with new header image and the loss curve for 124M
2023-01-08 00:41:14 +00:00
Andrej Karpathy
d562b3e550
shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks*
2023-01-05 00:44:35 +00:00
Andrej Karpathy
ab04701f9f
mention current 8GPU SOTA and shuffle sections a bit
2023-01-04 18:59:10 +00:00
Jonathan Rahn
26aa5f3ead
Update README.md
2023-01-04 10:28:13 +01:00
Andrej Karpathy
c72ecf5d93
add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more
2023-01-04 00:59:34 +00:00
Andrej Karpathy
5acba4b005
ty lambda labs
2023-01-03 21:16:07 +00:00
Andrej Karpathy
97fc42616e
adding few more dependencies
2023-01-03 17:54:48 +00:00
Andrej Karpathy
b45eec3e4b
flesh out the remaining TODOs in readme a bit more
2023-01-03 07:41:28 +00:00
Andrej Karpathy
2febf4463c
candidate changes to apis, have to think through more
2023-01-01 01:29:48 +00:00
Andrej Karpathy
7c6ea8409e
simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging
2022-12-30 22:18:20 +00:00
Andrej Karpathy
d8abd21258
typo fix in readme
2022-12-30 00:07:58 +00:00
Andrej Karpathy
5a725d9098
add torch.compile by default, shows almost 1.8X improvement in throughput nice
2022-12-30 00:07:13 +00:00
Andrej Karpathy
f88aa2c2fe
add link to mingpt
2022-12-29 17:38:33 +00:00
Andrej Karpathy
f2fc4be69b
mention 4gpu loss as well in readme
2022-12-29 17:26:42 +00:00
Andrej Karpathy
e7bac659f5
oops missed one # have to fix
2022-12-29 05:24:14 +00:00
Andrej Karpathy
97e2ab1b8d
enhance readme, add some todos
2022-12-29 05:23:36 +00:00
Andrej Karpathy
dea1507252
add support for DDP training. the scaling timings right now do not look good by default, have to dig more into
2022-12-29 05:06:07 +00:00
Andrej Karpathy
ee6459f1d0
readme tweaks
2022-12-29 02:00:25 +00:00
Andrej Karpathy
b760ef1358
add data loading into benchmarking as well, just for completeness
2022-12-29 00:05:32 +00:00
Andrej Karpathy
70b5d93aee
add benchmarking script v0
2022-12-28 23:55:43 +00:00
Andrej Karpathy
5d2b4807bf
adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm
2022-12-28 23:31:23 +00:00
Andrej Karpathy
c9fe00c0e9
small readme clarification and training script defaults changes
2022-12-28 01:45:55 +00:00
Andrej Karpathy
fe8042867c
first very bad commit
2022-12-28 00:58:19 +00:00