nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00

Author	SHA1	Message	Date
DG	edb7a7eab0	use relative paths so that running the data prep scripts always create files in local folder, no matter where run from	2023-01-20 10:39:45 -08:00
Clive Chan	67166079c9	Zero-grad more aggressively to save memory	2023-01-19 22:10:44 -08:00
Andrej Karpathy	2c7806db6e	for consistency with previous commit	2023-01-19 23:10:51 +00:00
Andrej	c1c20a0311	Merge pull request #57 from ryouze/patch-1 Improve readability of huge numbers	2023-01-19 15:08:35 -08:00
Andrej	9e150b808e	Merge pull request #66 from PWhiddy/patch-1 fix typo ( params -> tokens)	2023-01-18 22:29:51 -08:00
Peter Whidden	ff9085d0bc	fix typo ( params -> tokens)	2023-01-18 21:17:15 -05:00
Andrej Karpathy	8dd2061e4d	fix temperature comment, slightly wrong	2023-01-18 16:10:05 +00:00
Andrej Karpathy	2b083fbfde	the badge is a bit ugly, move it down to troubleshooting section	2023-01-18 03:16:59 +00:00
Andrej Karpathy	aa8e4c2546	screwed up the link, fix	2023-01-18 03:11:31 +00:00
Andrej Karpathy	6dab32c003	experimenting with badges, and discord link to start specifically. issues sometimes feel a little too heavy	2023-01-18 03:09:42 +00:00
リョウゼ	be571fff2c	Improve readability of huge numbers Before: length of dataset in characters: 1115394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1003854 tokens val has 111540 tokens After: length of dataset in characters: 1,115,394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1,003,854 tokens val has 111,540 tokens	2023-01-16 22:05:32 +01:00
Andrej Karpathy	7f74652843	add docs on multinode training to main README too	2023-01-16 17:11:02 +00:00
Andrej Karpathy	46ce9971df	small tweaks to docs and variable names stylistically	2023-01-16 16:56:05 +00:00
Andrej Karpathy	684800dd87	clarify that these should be run on two separate machines	2023-01-16 06:02:46 +00:00
Andrej Karpathy	9352df23de	docs for multinode ddp	2023-01-16 05:57:33 +00:00
Andrej Karpathy	c3dddbff3d	get rid of gpu_id, the world is more complicated than that when world_size > 8	2023-01-16 05:44:50 +00:00
Andrej Karpathy	f5e6ac8b02	local rank -> rank	2023-01-16 05:13:13 +00:00
Andrej Karpathy	cf99914886	add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR	2023-01-15 17:49:55 +00:00
Andrej Karpathy	89da79eee1	add note of caution for the produced warning, investigate later	2023-01-14 20:38:22 +00:00
Andrej Karpathy	7d7ded25ce	a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think	2023-01-14 03:59:53 +00:00
Andrej Karpathy	91d02510ce	fix bug... if topk > vocab_size, torch.topk will throw error	2023-01-14 03:57:00 +00:00
Andrej Karpathy	57735f532d	correctly propagate the vocab_size from the rendered dataset into the model args	2023-01-14 02:26:44 +00:00
Andrej Karpathy	43b37fd568	reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init	2023-01-14 02:16:10 +00:00
Andrej Karpathy	7c8288552b	tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings.	2023-01-14 01:00:55 +00:00
Andrej Karpathy	32b4f08d9d	it's true	2023-01-13 23:43:00 +00:00
Andrej Karpathy	3e0fd42579	more scaling laws, clarification, and add simple interpolation of Approach 2	2023-01-13 00:57:15 +00:00
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	e21cbf887f	meant to set always_save_checkpoint to False instead, so we only write when val improves	2023-01-12 05:47:34 +00:00
Andrej Karpathy	c1ac2d58f1	including transformers as a dependency of the repo as well	2023-01-12 02:42:38 +00:00
Andrej Karpathy	7f51d17977	add note about windows and pytorch 2.0 and torch compile in general	2023-01-12 02:17:52 +00:00
Andrej Karpathy	bb49751439	oh no nanoGPT is trending quickly explain the character-level functionality I added late last night	2023-01-11 17:11:15 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00
Andrej Karpathy	c2a402f7f7	guess the config from globals() and log all of it with wandb	2023-01-11 01:00:22 +00:00
Andrej Karpathy	8b2e622b27	adjust the readme to reflect changes in the autocast branch	2023-01-08 19:40:46 +00:00
Andrej Karpathy	b77c2e86d3	copy pasting what seems to work to bench,sample as well. ty @lantiga	2023-01-08 19:32:13 +00:00
Andrej Karpathy	a855d316fd	add device and dtype support to train.py args	2023-01-08 19:20:38 +00:00
Andrej	e7cd674ce7	Merge pull request #20 from lantiga/wandb-optional-import Make wandb import conditioned to wandb_log=True	2023-01-08 10:19:40 -08:00
Luca Antiga	09f1f458e8	Move conditional import	2023-01-08 15:51:50 +01:00
Luca Antiga	aba47f0a35	Make wandb import conditioned to wandb_log=True	2023-01-08 15:42:08 +01:00
Andrej Karpathy	e53b9d28ff	ran readme through spellchecker heh	2023-01-08 01:46:54 +00:00
Andrej Karpathy	df3b8a57ab	tune the readme with new header image and the loss curve for 124M	2023-01-08 00:41:14 +00:00
Andrej Karpathy	d56bdf05a6	progress! based on chinchilla author correspondence	2023-01-07 02:42:30 +00:00
Andrej Karpathy	27fc6a4112	small tweaks to notebook	2023-01-06 02:13:04 +00:00
Andrej Karpathy	69d1a5f1af	update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm...	2023-01-06 02:01:08 +00:00
Andrej Karpathy	9629093e53	minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle	2023-01-05 01:14:02 +00:00
Andrej	529c967a65	Merge pull request #19 from nat/patch-1 Strip unwanted prefix from state keys when loading model in sample.py	2023-01-04 16:46:32 -08:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Nat Friedman	2b9e168736	Strip unwanted prefix from state keys when loading model	2023-01-04 16:39:30 -08:00
Andrej Karpathy	ab04701f9f	mention current 8GPU SOTA and shuffle sections a bit	2023-01-04 18:59:10 +00:00
Andrej	1eefbb2520	Merge pull request #16 from jorahn/patch-1 Update README.md	2023-01-04 09:08:50 -08:00

1 2 3

132 Commits