nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	aa8e4c2546	screwed up the link, fix	2023-01-18 03:11:31 +00:00
Andrej Karpathy	6dab32c003	experimenting with badges, and discord link to start specifically. issues sometimes feel a little too heavy	2023-01-18 03:09:42 +00:00
リョウゼ	be571fff2c	Improve readability of huge numbers Before: length of dataset in characters: 1115394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1003854 tokens val has 111540 tokens After: length of dataset in characters: 1,115,394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1,003,854 tokens val has 111,540 tokens	2023-01-16 22:05:32 +01:00
Andrej Karpathy	7f74652843	add docs on multinode training to main README too	2023-01-16 17:11:02 +00:00
Andrej Karpathy	46ce9971df	small tweaks to docs and variable names stylistically	2023-01-16 16:56:05 +00:00
Andrej Karpathy	684800dd87	clarify that these should be run on two separate machines	2023-01-16 06:02:46 +00:00
Andrej Karpathy	9352df23de	docs for multinode ddp	2023-01-16 05:57:33 +00:00
Andrej Karpathy	c3dddbff3d	get rid of gpu_id, the world is more complicated than that when world_size > 8	2023-01-16 05:44:50 +00:00
Andrej Karpathy	f5e6ac8b02	local rank -> rank	2023-01-16 05:13:13 +00:00
Andrej Karpathy	cf99914886	add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR	2023-01-15 17:49:55 +00:00
Andrej Karpathy	89da79eee1	add note of caution for the produced warning, investigate later	2023-01-14 20:38:22 +00:00
Andrej Karpathy	7d7ded25ce	a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think	2023-01-14 03:59:53 +00:00
Andrej Karpathy	91d02510ce	fix bug... if topk > vocab_size, torch.topk will throw error	2023-01-14 03:57:00 +00:00
Andrej Karpathy	57735f532d	correctly propagate the vocab_size from the rendered dataset into the model args	2023-01-14 02:26:44 +00:00
Andrej Karpathy	43b37fd568	reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init	2023-01-14 02:16:10 +00:00
Andrej Karpathy	7c8288552b	tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.	2023-01-14 01:00:55 +00:00
Andrej Karpathy	32b4f08d9d	it's true	2023-01-13 23:43:00 +00:00
Andrej Karpathy	3e0fd42579	more scaling laws, clarification, and add simple interpolation of Approach 2	2023-01-13 00:57:15 +00:00
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	e21cbf887f	meant to set always_save_checkpoint to False instead, so we only write when val improves	2023-01-12 05:47:34 +00:00
Andrej Karpathy	c1ac2d58f1	including transformers as a dependency of the repo as well	2023-01-12 02:42:38 +00:00
Andrej Karpathy	7f51d17977	add note about windows and pytorch 2.0 and torch compile in general	2023-01-12 02:17:52 +00:00
Andrej Karpathy	bb49751439	oh no nanoGPT is trending quickly explain the character-level functionality I added late last night	2023-01-11 17:11:15 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00
Andrej Karpathy	c2a402f7f7	guess the config from globals() and log all of it with wandb	2023-01-11 01:00:22 +00:00
Andrej Karpathy	8b2e622b27	adjust the readme to reflect changes in the autocast branch	2023-01-08 19:40:46 +00:00
Andrej Karpathy	b77c2e86d3	copy pasting what seems to work to bench,sample as well. ty @lantiga	2023-01-08 19:32:13 +00:00
Andrej Karpathy	a855d316fd	add device and dtype support to train.py args	2023-01-08 19:20:38 +00:00
Andrej	e7cd674ce7	Merge pull request #20 from lantiga/wandb-optional-import Make wandb import conditioned to wandb_log=True	2023-01-08 10:19:40 -08:00
Luca Antiga	09f1f458e8	Move conditional import	2023-01-08 15:51:50 +01:00
Luca Antiga	aba47f0a35	Make wandb import conditioned to wandb_log=True	2023-01-08 15:42:08 +01:00
Andrej Karpathy	e53b9d28ff	ran readme through spellchecker heh	2023-01-08 01:46:54 +00:00
Andrej Karpathy	df3b8a57ab	tune the readme with new header image and the loss curve for 124M	2023-01-08 00:41:14 +00:00
Andrej Karpathy	d56bdf05a6	progress! based on chinchilla author correspondence	2023-01-07 02:42:30 +00:00
Andrej Karpathy	27fc6a4112	small tweaks to notebook	2023-01-06 02:13:04 +00:00
Andrej Karpathy	69d1a5f1af	update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm...	2023-01-06 02:01:08 +00:00
Andrej Karpathy	9629093e53	minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle	2023-01-05 01:14:02 +00:00
Andrej	529c967a65	Merge pull request #19 from nat/patch-1 Strip unwanted prefix from state keys when loading model in sample.py	2023-01-04 16:46:32 -08:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Nat Friedman	2b9e168736	Strip unwanted prefix from state keys when loading model	2023-01-04 16:39:30 -08:00
Andrej Karpathy	ab04701f9f	mention current 8GPU SOTA and shuffle sections a bit	2023-01-04 18:59:10 +00:00
Andrej	1eefbb2520	Merge pull request #16 from jorahn/patch-1 Update README.md	2023-01-04 09:08:50 -08:00
Jonathan Rahn	26aa5f3ead	Update README.md	2023-01-04 10:28:13 +01:00
Andrej Karpathy	c72ecf5d93	add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more	2023-01-04 00:59:34 +00:00
Andrej Karpathy	5acba4b005	ty lambda labs	2023-01-03 21:16:07 +00:00
Andrej Karpathy	97fc42616e	adding few more dependencies	2023-01-03 17:54:48 +00:00
Andrej Karpathy	9f95aca93e	better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas	2023-01-03 17:45:49 +00:00
Andrej Karpathy	b45eec3e4b	flesh out the remaining TODOs in readme a bit more	2023-01-03 07:41:28 +00:00
Andrej Karpathy	177d5f7dc5	disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11 . thanks @vgoklani for flagging	2023-01-02 23:05:01 +00:00
Andrej Karpathy	ea4de192e0	reshuffle args inside sample.py	2023-01-02 02:11:39 +00:00

1 2 3

124 Commits