nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	7c8288552b	tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.	2023-01-14 01:00:55 +00:00
Andrej Karpathy	32b4f08d9d	it's true	2023-01-13 23:43:00 +00:00
Andrej Karpathy	3e0fd42579	more scaling laws, clarification, and add simple interpolation of Approach 2	2023-01-13 00:57:15 +00:00
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	e21cbf887f	meant to set always_save_checkpoint to False instead, so we only write when val improves	2023-01-12 05:47:34 +00:00
Andrej Karpathy	c1ac2d58f1	including transformers as a dependency of the repo as well	2023-01-12 02:42:38 +00:00
Andrej Karpathy	7f51d17977	add note about windows and pytorch 2.0 and torch compile in general	2023-01-12 02:17:52 +00:00
Andrej Karpathy	bb49751439	oh no nanoGPT is trending quickly explain the character-level functionality I added late last night	2023-01-11 17:11:15 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00
Andrej Karpathy	c2a402f7f7	guess the config from globals() and log all of it with wandb	2023-01-11 01:00:22 +00:00
Andrej Karpathy	8b2e622b27	adjust the readme to reflect changes in the autocast branch	2023-01-08 19:40:46 +00:00
Andrej Karpathy	b77c2e86d3	copy pasting what seems to work to bench,sample as well. ty @lantiga	2023-01-08 19:32:13 +00:00
Andrej Karpathy	a855d316fd	add device and dtype support to train.py args	2023-01-08 19:20:38 +00:00
Andrej	e7cd674ce7	Merge pull request #20 from lantiga/wandb-optional-import Make wandb import conditioned to wandb_log=True	2023-01-08 10:19:40 -08:00
Luca Antiga	09f1f458e8	Move conditional import	2023-01-08 15:51:50 +01:00
Luca Antiga	aba47f0a35	Make wandb import conditioned to wandb_log=True	2023-01-08 15:42:08 +01:00
Andrej Karpathy	e53b9d28ff	ran readme through spellchecker heh	2023-01-08 01:46:54 +00:00
Andrej Karpathy	df3b8a57ab	tune the readme with new header image and the loss curve for 124M	2023-01-08 00:41:14 +00:00
Andrej Karpathy	d56bdf05a6	progress! based on chinchilla author correspondence	2023-01-07 02:42:30 +00:00
Andrej Karpathy	27fc6a4112	small tweaks to notebook	2023-01-06 02:13:04 +00:00
Andrej Karpathy	69d1a5f1af	update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm...	2023-01-06 02:01:08 +00:00
Andrej Karpathy	9629093e53	minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle	2023-01-05 01:14:02 +00:00
Andrej	529c967a65	Merge pull request #19 from nat/patch-1 Strip unwanted prefix from state keys when loading model in sample.py	2023-01-04 16:46:32 -08:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Nat Friedman	2b9e168736	Strip unwanted prefix from state keys when loading model	2023-01-04 16:39:30 -08:00
Andrej Karpathy	ab04701f9f	mention current 8GPU SOTA and shuffle sections a bit	2023-01-04 18:59:10 +00:00
Andrej	1eefbb2520	Merge pull request #16 from jorahn/patch-1 Update README.md	2023-01-04 09:08:50 -08:00
Jonathan Rahn	26aa5f3ead	Update README.md	2023-01-04 10:28:13 +01:00
Andrej Karpathy	c72ecf5d93	add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more	2023-01-04 00:59:34 +00:00
Andrej Karpathy	5acba4b005	ty lambda labs	2023-01-03 21:16:07 +00:00
Andrej Karpathy	97fc42616e	adding few more dependencies	2023-01-03 17:54:48 +00:00
Andrej Karpathy	9f95aca93e	better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas	2023-01-03 17:45:49 +00:00
Andrej Karpathy	b45eec3e4b	flesh out the remaining TODOs in readme a bit more	2023-01-03 07:41:28 +00:00
Andrej Karpathy	177d5f7dc5	disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11 . thanks @vgoklani for flagging	2023-01-02 23:05:01 +00:00
Andrej Karpathy	ea4de192e0	reshuffle args inside sample.py	2023-01-02 02:11:39 +00:00
Andrej Karpathy	ec9b1f8182	add a patch to fix mysterious unwanted prefix in state dict? maybe remove later	2023-01-02 01:25:02 +00:00
Andrej Karpathy	41184a27f5	rename compile_model to compile, shroter, version 2 stragglers	2023-01-02 01:15:55 +00:00
Andrej Karpathy	35f51974c4	rename to compile it's shorter	2023-01-02 01:14:46 +00:00
Andrej Karpathy	2febf4463c	candidate changes to apis, have to think through more	2023-01-01 01:29:48 +00:00
Andrej Karpathy	7c6ea8409e	simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging	2022-12-30 22:18:20 +00:00
Andrej Karpathy	d8abd21258	typo fix in readme	2022-12-30 00:07:58 +00:00
Andrej Karpathy	5a725d9098	add torch.compile by default, shows almost 1.8X improvement in throughput nice	2022-12-30 00:07:13 +00:00
Andrej	fb52554ca8	Merge pull request #1 from ankandrew/master Minor Frozen GPTConfig	2022-12-29 13:45:20 -08:00
ankandrew	7f0e6d9a71	Frozen GPTConfig	2022-12-29 17:07:19 -03:00
Andrej Karpathy	682a0ac8f1	properly resume training, also loading iter_num and best_val_loss from checkpoints	2022-12-29 18:23:15 +00:00
Andrej Karpathy	f88aa2c2fe	add link to mingpt	2022-12-29 17:38:33 +00:00
Andrej Karpathy	f2fc4be69b	mention 4gpu loss as well in readme	2022-12-29 17:26:42 +00:00
Andrej Karpathy	fa57d464d7	pull out dtype up top	2022-12-29 05:32:55 +00:00
Andrej Karpathy	e7bac659f5	oops missed one # have to fix	2022-12-29 05:24:14 +00:00
Andrej Karpathy	97e2ab1b8d	enhance readme, add some todos	2022-12-29 05:23:36 +00:00

1 2 3 4

159 Commits