Andrej Karpathy
|
684800dd87
|
clarify that these should be run on two separate machines
|
2023-01-16 06:02:46 +00:00 |
|
Andrej Karpathy
|
9352df23de
|
docs for multinode ddp
|
2023-01-16 05:57:33 +00:00 |
|
Andrej Karpathy
|
c3dddbff3d
|
get rid of gpu_id, the world is more complicated than that when world_size > 8
|
2023-01-16 05:44:50 +00:00 |
|
Andrej Karpathy
|
f5e6ac8b02
|
local rank -> rank
|
2023-01-16 05:13:13 +00:00 |
|
MicroPanda123
|
d5ee965974
|
Update README.md
|
2023-01-15 20:29:15 +00:00 |
|
Andrej Karpathy
|
cf99914886
|
add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR
|
2023-01-15 17:49:55 +00:00 |
|
Andrej Karpathy
|
89da79eee1
|
add note of caution for the produced warning, investigate later
|
2023-01-14 20:38:22 +00:00 |
|
Andrej Karpathy
|
7d7ded25ce
|
a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think
|
2023-01-14 03:59:53 +00:00 |
|
Andrej Karpathy
|
91d02510ce
|
fix bug... if topk > vocab_size, torch.topk will throw error
|
2023-01-14 03:57:00 +00:00 |
|
Andrej Karpathy
|
57735f532d
|
correctly propagate the vocab_size from the rendered dataset into the model args
|
2023-01-14 02:26:44 +00:00 |
|
Andrej Karpathy
|
43b37fd568
|
reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init
|
2023-01-14 02:16:10 +00:00 |
|
Andrej Karpathy
|
7c8288552b
|
tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.
|
2023-01-14 01:00:55 +00:00 |
|
Andrej Karpathy
|
32b4f08d9d
|
it's true
|
2023-01-13 23:43:00 +00:00 |
|
Andrej Karpathy
|
3e0fd42579
|
more scaling laws, clarification, and add simple interpolation of Approach 2
|
2023-01-13 00:57:15 +00:00 |
|
Andrej Karpathy
|
8f85b83347
|
inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token
|
2023-01-12 06:02:50 +00:00 |
|
Andrej Karpathy
|
e21cbf887f
|
meant to set always_save_checkpoint to False instead, so we only write when val improves
|
2023-01-12 05:47:34 +00:00 |
|
Andrej Karpathy
|
c1ac2d58f1
|
including transformers as a dependency of the repo as well
|
2023-01-12 02:42:38 +00:00 |
|
Andrej Karpathy
|
7f51d17977
|
add note about windows and pytorch 2.0 and torch compile in general
|
2023-01-12 02:17:52 +00:00 |
|
Andrej Karpathy
|
bb49751439
|
oh no nanoGPT is trending quickly explain the character-level functionality I added late last night
|
2023-01-11 17:11:15 +00:00 |
|
Andrej Karpathy
|
d17350a31d
|
add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.
|
2023-01-11 05:27:19 +00:00 |
|
Andrej Karpathy
|
c2a402f7f7
|
guess the config from globals() and log all of it with wandb
|
2023-01-11 01:00:22 +00:00 |
|
Andrej Karpathy
|
8b2e622b27
|
adjust the readme to reflect changes in the autocast branch
|
2023-01-08 19:40:46 +00:00 |
|
Andrej Karpathy
|
b77c2e86d3
|
copy pasting what seems to work to bench,sample as well. ty @lantiga
|
2023-01-08 19:32:13 +00:00 |
|
Andrej Karpathy
|
a855d316fd
|
add device and dtype support to train.py args
|
2023-01-08 19:20:38 +00:00 |
|
Andrej
|
e7cd674ce7
|
Merge pull request #20 from lantiga/wandb-optional-import
Make wandb import conditioned to wandb_log=True
|
2023-01-08 10:19:40 -08:00 |
|
Luca Antiga
|
09f1f458e8
|
Move conditional import
|
2023-01-08 15:51:50 +01:00 |
|
Luca Antiga
|
aba47f0a35
|
Make wandb import conditioned to wandb_log=True
|
2023-01-08 15:42:08 +01:00 |
|
Andrej Karpathy
|
e53b9d28ff
|
ran readme through spellchecker heh
|
2023-01-08 01:46:54 +00:00 |
|
Andrej Karpathy
|
df3b8a57ab
|
tune the readme with new header image and the loss curve for 124M
|
2023-01-08 00:41:14 +00:00 |
|
Andrej Karpathy
|
d56bdf05a6
|
progress! based on chinchilla author correspondence
|
2023-01-07 02:42:30 +00:00 |
|
Andrej Karpathy
|
27fc6a4112
|
small tweaks to notebook
|
2023-01-06 02:13:04 +00:00 |
|
Andrej Karpathy
|
69d1a5f1af
|
update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm...
|
2023-01-06 02:01:08 +00:00 |
|
Andrej Karpathy
|
9629093e53
|
minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle
|
2023-01-05 01:14:02 +00:00 |
|
Andrej
|
529c967a65
|
Merge pull request #19 from nat/patch-1
Strip unwanted prefix from state keys when loading model in sample.py
|
2023-01-04 16:46:32 -08:00 |
|
Andrej Karpathy
|
d562b3e550
|
shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks*
|
2023-01-05 00:44:35 +00:00 |
|
Nat Friedman
|
2b9e168736
|
Strip unwanted prefix from state keys when loading model
|
2023-01-04 16:39:30 -08:00 |
|
Andrej Karpathy
|
ab04701f9f
|
mention current 8GPU SOTA and shuffle sections a bit
|
2023-01-04 18:59:10 +00:00 |
|
Andrej
|
1eefbb2520
|
Merge pull request #16 from jorahn/patch-1
Update README.md
|
2023-01-04 09:08:50 -08:00 |
|
Jonathan Rahn
|
26aa5f3ead
|
Update README.md
|
2023-01-04 10:28:13 +01:00 |
|
Andrej Karpathy
|
c72ecf5d93
|
add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more
|
2023-01-04 00:59:34 +00:00 |
|
Andrej Karpathy
|
5acba4b005
|
ty lambda labs
|
2023-01-03 21:16:07 +00:00 |
|
Andrej Karpathy
|
97fc42616e
|
adding few more dependencies
|
2023-01-03 17:54:48 +00:00 |
|
Andrej Karpathy
|
9f95aca93e
|
better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas
|
2023-01-03 17:45:49 +00:00 |
|
Andrej Karpathy
|
b45eec3e4b
|
flesh out the remaining TODOs in readme a bit more
|
2023-01-03 07:41:28 +00:00 |
|
Andrej Karpathy
|
177d5f7dc5
|
disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11. thanks @vgoklani for flagging
|
2023-01-02 23:05:01 +00:00 |
|
Laiho
|
0a2ea95338
|
batch file write
|
2023-01-02 17:49:21 +02:00 |
|
Andrej Karpathy
|
ea4de192e0
|
reshuffle args inside sample.py
|
2023-01-02 02:11:39 +00:00 |
|
Andrej Karpathy
|
ec9b1f8182
|
add a patch to fix mysterious unwanted prefix in state dict? maybe remove later
|
2023-01-02 01:25:02 +00:00 |
|
Andrej Karpathy
|
41184a27f5
|
rename compile_model to compile, shroter, version 2 stragglers
|
2023-01-02 01:15:55 +00:00 |
|
Andrej Karpathy
|
35f51974c4
|
rename to compile it's shorter
|
2023-01-02 01:14:46 +00:00 |
|