Andrej Karpathy
|
7c8288552b
|
tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.
|
2023-01-14 01:00:55 +00:00 |
|
Andrej Karpathy
|
8f85b83347
|
inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token
|
2023-01-12 06:02:50 +00:00 |
|
Andrej Karpathy
|
177d5f7dc5
|
disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11. thanks @vgoklani for flagging
|
2023-01-02 23:05:01 +00:00 |
|
Andrej Karpathy
|
2febf4463c
|
candidate changes to apis, have to think through more
|
2023-01-01 01:29:48 +00:00 |
|
ankandrew
|
7f0e6d9a71
|
Frozen GPTConfig
|
2022-12-29 17:07:19 -03:00 |
|
Andrej Karpathy
|
fe8042867c
|
first very bad commit
|
2022-12-28 00:58:19 +00:00 |
|