1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00
Commit Graph

14 Commits

Author SHA1 Message Date
Andrej Karpathy
e808a67149 bunch of plumbing of bias all around. measuring bias=False to be about 6% faster 2023-01-27 20:41:17 +00:00
Andrej Karpathy
cc5444e194 add the bias option to config, default it to True for now 2023-01-27 20:29:45 +00:00
Andrej Karpathy
2bf07a3fbf rewrite model class so layernorm has an optional bias= parameter 2023-01-27 20:17:32 +00:00
Andrej Karpathy
2892858ce7 attempt a non-biased model, per few papers that cite this as working well 2023-01-27 18:54:08 +00:00
Andrej Karpathy
23a0bfac20 try bring back mingpt init 2023-01-27 16:52:18 +00:00
Andrej Karpathy
89da79eee1 add note of caution for the produced warning, investigate later 2023-01-14 20:38:22 +00:00
Andrej Karpathy
91d02510ce fix bug... if topk > vocab_size, torch.topk will throw error 2023-01-14 03:57:00 +00:00
Andrej Karpathy
43b37fd568 reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init 2023-01-14 02:16:10 +00:00
Andrej Karpathy
7c8288552b tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings. 2023-01-14 01:00:55 +00:00
Andrej Karpathy
8f85b83347 inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token 2023-01-12 06:02:50 +00:00
Andrej Karpathy
177d5f7dc5 disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11. thanks @vgoklani for flagging 2023-01-02 23:05:01 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
ankandrew
7f0e6d9a71 Frozen GPTConfig 2022-12-29 17:07:19 -03:00
Andrej Karpathy
fe8042867c first very bad commit 2022-12-28 00:58:19 +00:00