nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2025-04-26 20:53:10 +00:00

Author	SHA1	Message	Date
MicroPanda123	d5ee965974	Update README.md	2023-01-15 20:29:15 +00:00
Andrej Karpathy	cf99914886	add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR	2023-01-15 17:49:55 +00:00
Andrej Karpathy	89da79eee1	add note of caution for the produced warning, investigate later	2023-01-14 20:38:22 +00:00
Andrej Karpathy	7d7ded25ce	a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think	2023-01-14 03:59:53 +00:00
Andrej Karpathy	91d02510ce	fix bug... if topk > vocab_size, torch.topk will throw error	2023-01-14 03:57:00 +00:00
Andrej Karpathy	57735f532d	correctly propagate the vocab_size from the rendered dataset into the model args	2023-01-14 02:26:44 +00:00
Andrej Karpathy	43b37fd568	reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init	2023-01-14 02:16:10 +00:00
Andrej Karpathy	7c8288552b	tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.	2023-01-14 01:00:55 +00:00
Andrej Karpathy	32b4f08d9d	it's true	2023-01-13 23:43:00 +00:00
Andrej Karpathy	3e0fd42579	more scaling laws, clarification, and add simple interpolation of Approach 2	2023-01-13 00:57:15 +00:00
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	e21cbf887f	meant to set always_save_checkpoint to False instead, so we only write when val improves	2023-01-12 05:47:34 +00:00
Andrej Karpathy	c1ac2d58f1	including transformers as a dependency of the repo as well	2023-01-12 02:42:38 +00:00
Andrej Karpathy	7f51d17977	add note about windows and pytorch 2.0 and torch compile in general	2023-01-12 02:17:52 +00:00
Andrej Karpathy	bb49751439	oh no nanoGPT is trending quickly explain the character-level functionality I added late last night	2023-01-11 17:11:15 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00
Andrej Karpathy	c2a402f7f7	guess the config from globals() and log all of it with wandb	2023-01-11 01:00:22 +00:00
Andrej Karpathy	8b2e622b27	adjust the readme to reflect changes in the autocast branch	2023-01-08 19:40:46 +00:00
Andrej Karpathy	b77c2e86d3	copy pasting what seems to work to bench,sample as well. ty @lantiga	2023-01-08 19:32:13 +00:00
Andrej Karpathy	a855d316fd	add device and dtype support to train.py args	2023-01-08 19:20:38 +00:00
Andrej	e7cd674ce7	Merge pull request #20 from lantiga/wandb-optional-import Make wandb import conditioned to wandb_log=True	2023-01-08 10:19:40 -08:00
Luca Antiga	09f1f458e8	Move conditional import	2023-01-08 15:51:50 +01:00
Luca Antiga	aba47f0a35	Make wandb import conditioned to wandb_log=True	2023-01-08 15:42:08 +01:00
Andrej Karpathy	e53b9d28ff	ran readme through spellchecker heh	2023-01-08 01:46:54 +00:00
Andrej Karpathy	df3b8a57ab	tune the readme with new header image and the loss curve for 124M	2023-01-08 00:41:14 +00:00
Andrej Karpathy	d56bdf05a6	progress! based on chinchilla author correspondence	2023-01-07 02:42:30 +00:00
Andrej Karpathy	27fc6a4112	small tweaks to notebook	2023-01-06 02:13:04 +00:00
Andrej Karpathy	69d1a5f1af	update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm...	2023-01-06 02:01:08 +00:00
Andrej Karpathy	9629093e53	minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle	2023-01-05 01:14:02 +00:00
Andrej	529c967a65	Merge pull request #19 from nat/patch-1 Strip unwanted prefix from state keys when loading model in sample.py	2023-01-04 16:46:32 -08:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Nat Friedman	2b9e168736	Strip unwanted prefix from state keys when loading model	2023-01-04 16:39:30 -08:00
Andrej Karpathy	ab04701f9f	mention current 8GPU SOTA and shuffle sections a bit	2023-01-04 18:59:10 +00:00
Andrej	1eefbb2520	Merge pull request #16 from jorahn/patch-1 Update README.md	2023-01-04 09:08:50 -08:00
Jonathan Rahn	26aa5f3ead	Update README.md	2023-01-04 10:28:13 +01:00
Andrej Karpathy	c72ecf5d93	add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more	2023-01-04 00:59:34 +00:00
Andrej Karpathy	5acba4b005	ty lambda labs	2023-01-03 21:16:07 +00:00
Andrej Karpathy	97fc42616e	adding few more dependencies	2023-01-03 17:54:48 +00:00
Andrej Karpathy	9f95aca93e	better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas	2023-01-03 17:45:49 +00:00
Andrej Karpathy	b45eec3e4b	flesh out the remaining TODOs in readme a bit more	2023-01-03 07:41:28 +00:00
Andrej Karpathy	177d5f7dc5	disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11 . thanks @vgoklani for flagging	2023-01-02 23:05:01 +00:00
Andrej Karpathy	ea4de192e0	reshuffle args inside sample.py	2023-01-02 02:11:39 +00:00
Andrej Karpathy	ec9b1f8182	add a patch to fix mysterious unwanted prefix in state dict? maybe remove later	2023-01-02 01:25:02 +00:00
Andrej Karpathy	41184a27f5	rename compile_model to compile, shroter, version 2 stragglers	2023-01-02 01:15:55 +00:00
Andrej Karpathy	35f51974c4	rename to compile it's shorter	2023-01-02 01:14:46 +00:00
Andrej Karpathy	2febf4463c	candidate changes to apis, have to think through more	2023-01-01 01:29:48 +00:00
Andrej Karpathy	7c6ea8409e	simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging	2022-12-30 22:18:20 +00:00
Andrej Karpathy	d8abd21258	typo fix in readme	2022-12-30 00:07:58 +00:00
Andrej Karpathy	5a725d9098	add torch.compile by default, shows almost 1.8X improvement in throughput nice	2022-12-30 00:07:13 +00:00
Andrej	fb52554ca8	Merge pull request #1 from ankandrew/master Minor Frozen GPTConfig	2022-12-29 13:45:20 -08:00

1 2

66 Commits