nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	43b37fd568	reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init	2023-01-14 02:16:10 +00:00
Andrej Karpathy	7c8288552b	tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.	2023-01-14 01:00:55 +00:00
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	177d5f7dc5	disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11 . thanks @vgoklani for flagging	2023-01-02 23:05:01 +00:00
Andrej Karpathy	2febf4463c	candidate changes to apis, have to think through more	2023-01-01 01:29:48 +00:00
ankandrew	7f0e6d9a71	Frozen GPTConfig	2022-12-29 17:07:19 -03:00
Andrej Karpathy	fe8042867c	first very bad commit	2022-12-28 00:58:19 +00:00