| 
							
							
								 Andrej Karpathy | d17350a31d | add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders. | 2023-01-11 05:27:19 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | c2a402f7f7 | guess the config from globals() and log all of it with wandb | 2023-01-11 01:00:22 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 8b2e622b27 | adjust the readme to reflect changes in the autocast branch | 2023-01-08 19:40:46 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | b77c2e86d3 | copy pasting what seems to work to bench,sample as well. ty @lantiga | 2023-01-08 19:32:13 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | a855d316fd | add device and dtype support to train.py args | 2023-01-08 19:20:38 +00:00 |  | 
			
				
					| 
							
							
								 Andrej | e7cd674ce7 | Merge pull request #20 from lantiga/wandb-optional-import Make wandb import conditioned to wandb_log=True | 2023-01-08 10:19:40 -08:00 |  | 
			
				
					| 
							
							
								 Luca Antiga | 09f1f458e8 | Move conditional import | 2023-01-08 15:51:50 +01:00 |  | 
			
				
					| 
							
							
								 Luca Antiga | aba47f0a35 | Make wandb import conditioned to wandb_log=True | 2023-01-08 15:42:08 +01:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | e53b9d28ff | ran readme through spellchecker heh | 2023-01-08 01:46:54 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | df3b8a57ab | tune the readme with new header image and the loss curve for 124M | 2023-01-08 00:41:14 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | d56bdf05a6 | progress! based on chinchilla author correspondence | 2023-01-07 02:42:30 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 27fc6a4112 | small tweaks to notebook | 2023-01-06 02:13:04 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 69d1a5f1af | update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm... | 2023-01-06 02:01:08 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 9629093e53 | minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle | 2023-01-05 01:14:02 +00:00 |  | 
			
				
					| 
							
							
								 Andrej | 529c967a65 | Merge pull request #19 from nat/patch-1 Strip unwanted prefix from state keys when loading model in sample.py | 2023-01-04 16:46:32 -08:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | d562b3e550 | shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks* | 2023-01-05 00:44:35 +00:00 |  | 
			
				
					| 
							
							
								 Nat Friedman | 2b9e168736 | Strip unwanted prefix from state keys when loading model | 2023-01-04 16:39:30 -08:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | ab04701f9f | mention current 8GPU SOTA and shuffle sections a bit | 2023-01-04 18:59:10 +00:00 |  | 
			
				
					| 
							
							
								 Andrej | 1eefbb2520 | Merge pull request #16 from jorahn/patch-1 Update README.md | 2023-01-04 09:08:50 -08:00 |  | 
			
				
					| 
							
							
								 Jonathan Rahn | 26aa5f3ead | Update README.md | 2023-01-04 10:28:13 +01:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | c72ecf5d93 | add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more | 2023-01-04 00:59:34 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 5acba4b005 | ty lambda labs | 2023-01-03 21:16:07 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 97fc42616e | adding few more dependencies | 2023-01-03 17:54:48 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 9f95aca93e | better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas | 2023-01-03 17:45:49 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | b45eec3e4b | flesh out the remaining TODOs in readme a bit more | 2023-01-03 07:41:28 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 177d5f7dc5 | disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11. thanks @vgoklani for flagging | 2023-01-02 23:05:01 +00:00 |  | 
			
				
					| 
							
							
								 Laiho | 0a2ea95338 | batch file write | 2023-01-02 17:49:21 +02:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | ea4de192e0 | reshuffle args inside sample.py | 2023-01-02 02:11:39 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | ec9b1f8182 | add a patch to fix mysterious unwanted prefix in state dict? maybe remove later | 2023-01-02 01:25:02 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 41184a27f5 | rename compile_model to compile, shroter, version 2 stragglers | 2023-01-02 01:15:55 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 35f51974c4 | rename to compile it's shorter | 2023-01-02 01:14:46 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 2febf4463c | candidate changes to apis, have to think through more | 2023-01-01 01:29:48 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 7c6ea8409e | simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging | 2022-12-30 22:18:20 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | d8abd21258 | typo fix in readme | 2022-12-30 00:07:58 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 5a725d9098 | add torch.compile by default, shows almost 1.8X improvement in throughput nice | 2022-12-30 00:07:13 +00:00 |  | 
			
				
					| 
							
							
								 Andrej | fb52554ca8 | Merge pull request #1 from ankandrew/master Minor Frozen GPTConfig | 2022-12-29 13:45:20 -08:00 |  | 
			
				
					| 
							
							
								 ankandrew | 7f0e6d9a71 | Frozen GPTConfig | 2022-12-29 17:07:19 -03:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 682a0ac8f1 | properly resume training, also loading iter_num and best_val_loss  from checkpoints | 2022-12-29 18:23:15 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | f88aa2c2fe | add link to mingpt | 2022-12-29 17:38:33 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | f2fc4be69b | mention 4gpu loss as well in readme | 2022-12-29 17:26:42 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | fa57d464d7 | pull out dtype up top | 2022-12-29 05:32:55 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | e7bac659f5 | oops missed one # have to fix | 2022-12-29 05:24:14 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 97e2ab1b8d | enhance readme, add some todos | 2022-12-29 05:23:36 +00:00 |  | 
			
				
					| 
							
							
								 Andrej | cc11744131 | Add MIT LICENSE file | 2022-12-28 21:11:26 -08:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | dea1507252 | add support for DDP training. the scaling timings right now do not look good by default, have to dig more into | 2022-12-29 05:06:07 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | ee6459f1d0 | readme tweaks | 2022-12-29 02:00:25 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 3000cf5dda | add pytorch profiler support. not sure how to support both profiler and simple benchmarking, a bit gnarly atm hmm | 2022-12-29 01:49:53 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | b760ef1358 | add data loading into benchmarking as well, just for completeness | 2022-12-29 00:05:32 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 70b5d93aee | add benchmarking script v0 | 2022-12-28 23:55:43 +00:00 |  | 
			
				
					| 
							
							
								 Andrej Karpathy | 5d2b4807bf | adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm | 2022-12-28 23:31:23 +00:00 |  |