vinjn 
							
						 
					 
					
						
						
							
						
						dccf362c2b 
					 
					
						
						
							
							Move enc to gloabal namespace  
						
						
						
						
					 
					
						2024-01-12 12:53:20 -08:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						eba36e8464 
					 
					
						
						
							
							Merge pull request  #309  from ho2103/master  
						
						... 
						
						
						
						Fix AssertionError on macOS - need to check CUDA availability for bf16 
						
						
					 
					
						2023-06-22 08:24:17 -07:00 
						 
				 
			
				
					
						
							
							
								o 
							
						 
					 
					
						
						
							
						
						1eaceae193 
					 
					
						
						
							
							Fix AssertionError on macOS - need to check CUDA availability for bf16  
						
						
						
						
					 
					
						2023-06-19 18:05:09 -04:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						4eb7a96b07 
					 
					
						
						
							
							Merge pull request  #305  from okuvshynov/fix_osx_dataload  
						
						... 
						
						
						
						nanogpt: fix multiprocessing in load_dataset on os x 
						
						
					 
					
						2023-06-17 20:26:35 -07:00 
						 
				 
			
				
					
						
							
							
								Oleksandr Kuvshynov 
							
						 
					 
					
						
						
							
						
						542ac51d1f 
					 
					
						
						
							
							nanogpt: fix multiprocessing in load_dataset on os x  
						
						... 
						
						
						
						The issue seems to be that _fixup_main_from_path in multiprocessing
module in python is unable to find entry point, thus, adding
```
if __name__ == '__main__'
``` 
						
						
					 
					
						2023-06-17 20:35:38 -04:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						41d7014f7d 
					 
					
						
						
							
							Merge pull request  #301  from okuvshynov/master  
						
						... 
						
						
						
						[easy] allow multithreading in load_dataset 
						
						
					 
					
						2023-06-16 18:30:03 -07:00 
						 
				 
			
				
					
						
							
							
								Oleksandr Kuvshynov 
							
						 
					 
					
						
						
							
						
						bb7e96754a 
					 
					
						
						
							
							nanogpt: allow multithreading in load dataset  
						
						
						
						
					 
					
						2023-06-16 20:00:17 -04:00 
						 
				 
			
				
					
						
							
							
								Andrej Karpathy 
							
						 
					 
					
						
						
							
						
						7339b904ef 
					 
					
						
						
							
							use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix  
						
						
						
						
					 
					
						2023-06-14 23:33:07 +00:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						f08abb45bd 
					 
					
						
						
							
							Merge pull request  #274  from apivovarov/gelu  
						
						... 
						
						
						
						Use nn.GELU - 1.27x faster training 
						
						
					 
					
						2023-06-14 16:25:15 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						18ee6b62b6 
					 
					
						
						
							
							Merge pull request  #275  from apivovarov/rm_unsqueeze  
						
						... 
						
						
						
						Remove pos unsqueeze(0) 
						
						
					 
					
						2023-06-14 15:38:45 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						ed7887c888 
					 
					
						
						
							
							Merge pull request  #270  from LaihoE/master  
						
						... 
						
						
						
						fix np.sum overflows on windows 
						
						
					 
					
						2023-06-14 15:36:26 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						8020bb582b 
					 
					
						
						
							
							Merge pull request  #276  from apivovarov/gitign  
						
						... 
						
						
						
						Add more files to .gitignore 
						
						
					 
					
						2023-06-14 15:30:39 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						0f06d9b889 
					 
					
						
						
							
							Merge pull request  #277  from apivovarov/is_bf16_supported  
						
						... 
						
						
						
						Use bf16 only if supported 
						
						
					 
					
						2023-06-14 15:29:50 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						cf4835ed6f 
					 
					
						
						
							
							Merge pull request  #286  from ctjlewis/master  
						
						... 
						
						
						
						docs: simplify dependencies installation 
						
						
					 
					
						2023-06-14 15:21:04 -07:00 
						 
				 
			
				
					
						
							
							
								Lewis 
							
						 
					 
					
						
						
							
						
						eeac8732b9 
					 
					
						
						
							
							docs: simplify dependencies installation  
						
						... 
						
						
						
						Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well. 
						
						
					 
					
						2023-05-31 23:04:08 -05:00 
						 
				 
			
				
					
						
							
							
								Alexander Pivovarov 
							
						 
					 
					
						
						
							
						
						eb33b8bf1c 
					 
					
						
						
							
							Use bf16 only if supported  
						
						
						
						
					 
					
						2023-05-17 03:26:48 +00:00 
						 
				 
			
				
					
						
							
							
								Alexander Pivovarov 
							
						 
					 
					
						
						
							
						
						b120c421bf 
					 
					
						
						
							
							Add more files to .gitignore  
						
						
						
						
					 
					
						2023-05-17 02:50:22 +00:00 
						 
				 
			
				
					
						
							
							
								Alexander Pivovarov 
							
						 
					 
					
						
						
							
						
						39ae397a93 
					 
					
						
						
							
							Remove pos unsqueeze(0)  
						
						
						
						
					 
					
						2023-05-17 02:30:18 +00:00 
						 
				 
			
				
					
						
							
							
								Alexander Pivovarov 
							
						 
					 
					
						
						
							
						
						594068e7ae 
					 
					
						
						
							
							Use nn.GELU  
						
						
						
						
					 
					
						2023-05-17 00:53:35 +00:00 
						 
				 
			
				
					
						
							
							
								Laiho 
							
						 
					 
					
						
						
							
						
						6649b299eb 
					 
					
						
						
							
							np.sum overflows on windows  
						
						
						
						
					 
					
						2023-05-09 16:36:59 +03:00 
						 
				 
			
				
					
						
							
							
								Andrej Karpathy 
							
						 
					 
					
						
						
							
						
						7fe4a099ad 
					 
					
						
						
							
							simplify configure_optimizers by a lot  
						
						
						
						
					 
					
						2023-05-06 14:40:28 +00:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						196160b849 
					 
					
						
						
							
							Merge pull request  #247  from gnobre/macbook-run-instructions  
						
						... 
						
						
						
						Macbook run instructions 
						
						
					 
					
						2023-04-17 20:16:31 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						21f9bff7e4 
					 
					
						
						
							
							Merge pull request  #225  from otaviogood/grad_accum  
						
						... 
						
						
						
						Fix for gradient_accumulation_steps training slow 
						
						
					 
					
						2023-04-17 20:11:25 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						a6a708c7f1 
					 
					
						
						
							
							Merge branch 'master' into grad_accum  
						
						
						
						
					 
					
						2023-04-17 20:11:00 -07:00 
						 
				 
			
				
					
						
							
							
								Guilherme Nobre 
							
						 
					 
					
						
						
							
						
						e30c8fda23 
					 
					
						
						
							
							Merge branch 'karpathy:master' into macbook-run-instructions  
						
						
						
						
					 
					
						2023-04-15 09:50:58 +01:00 
						 
				 
			
				
					
						
							
							
								Guilherme 
							
						 
					 
					
						
						
							
						
						4732c43af3 
					 
					
						
						
							
							add macbook specific instructions to generate samples  
						
						
						
						
					 
					
						2023-04-15 09:49:38 +01:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						d9f4735f5e 
					 
					
						
						
							
							Merge pull request  #10  from LaihoE/master  
						
						... 
						
						
						
						batch file write 
						
						
					 
					
						2023-04-13 00:39:41 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						b288f4cfb2 
					 
					
						
						
							
							Merge pull request  #146  from lutzroeder/master  
						
						... 
						
						
						
						Add .gitignore 
						
						
					 
					
						2023-04-12 22:48:37 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						079df20748 
					 
					
						
						
							
							Merge pull request  #74  from venusatuluri/fix_decode  
						
						... 
						
						
						
						Small fix to decode fn in shakespeare_char/prepare.py 
						
						
					 
					
						2023-04-12 22:45:01 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						01e48ec1ab 
					 
					
						
						
							
							Merge pull request  #240  from YassineYousfi/master  
						
						... 
						
						
						
						don't dropout in eval mode 
						
						
					 
					
						2023-04-12 22:43:59 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						7840a66859 
					 
					
						
						
							
							Merge pull request  #54  from MicroPanda123/luv  
						
						... 
						
						
						
						Give tqdm some love :) 
						
						
					 
					
						2023-04-12 22:25:18 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						8abe215fba 
					 
					
						
						
							
							Merge pull request  #128  from abrahamsangha/fix-typo  
						
						... 
						
						
						
						fix typo 
						
						
					 
					
						2023-04-12 22:24:41 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						ad62003d7a 
					 
					
						
						
							
							Merge pull request  #142  from kovkev/patch-1  
						
						... 
						
						
						
						Fix the position of a comma 
						
						
					 
					
						2023-04-12 22:24:06 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						ea24604b29 
					 
					
						
						
							
							Merge pull request  #220  from python273/patch-1  
						
						... 
						
						
						
						Fix GPT.crop_block_size when flash attention is available 
						
						
					 
					
						2023-04-12 22:13:01 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						8aeea6d970 
					 
					
						
						
							
							Merge pull request  #224  from SnehalRaj/patch-1  
						
						... 
						
						
						
						fix small typo 
						
						
					 
					
						2023-04-12 22:12:26 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						2457471c9c 
					 
					
						
						
							
							Merge pull request  #236  from ymurenko/master  
						
						... 
						
						
						
						fix "cuda out of memory" when resuming training 
						
						
					 
					
						2023-04-12 22:09:42 -07:00 
						 
				 
			
				
					
						
							
							
								Andrej Karpathy 
							
						 
					 
					
						
						
							
						
						553f949f46 
					 
					
						
						
							
							fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping  
						
						
						
						
					 
					
						2023-04-13 04:59:11 +00:00 
						 
				 
			
				
					
						
							
							
								Yassine Yousfi 
							
						 
					 
					
						
						
							
						
						7399dfe39d 
					 
					
						
						
							
							dont always dropout!  
						
						
						
						
					 
					
						2023-04-10 22:56:22 -07:00 
						 
				 
			
				
					
						
							
							
								ymurenko 
							
						 
					 
					
						
						
							
						
						4ac2e8ce3a 
					 
					
						
						
							
							fix "cuda out of memory" when resuming training  
						
						
						
						
					 
					
						2023-04-05 17:28:55 -04:00 
						 
				 
			
				
					
						
							
							
								Snehal Raj 
							
						 
					 
					
						
						
							
						
						c58fc4605c 
					 
					
						
						
							
							fix small typo  
						
						
						
						
					 
					
						2023-03-25 20:36:46 +01:00 
						 
				 
			
				
					
						
							
							
								Otavio Good 
							
						 
					 
					
						
						
							
						
						978d4fe538 
					 
					
						
						
							
							Fix for gradient_accumulation_steps training slow  
						
						
						
						
					 
					
						2023-03-25 00:04:45 -07:00 
						 
				 
			
				
					
						
							
							
								Kirill 
							
						 
					 
					
						
						
							
						
						c3f254844d 
					 
					
						
						
							
							Fix GPT.crop_block_size when flash attention is available  
						
						
						
						
					 
					
						2023-03-24 14:51:02 +03:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						a82b33b525 
					 
					
						
						
							
							Merge pull request  #199  from ChristianOrr/patch-1  
						
						... 
						
						
						
						bugfix in decode function 
						
						
					 
					
						2023-03-12 13:40:20 -07:00 
						 
				 
			
				
					
						
							
							
								Christian Orr 
							
						 
					 
					
						
						
							
						
						36c7db8c44 
					 
					
						
						
							
							bugfix in decode function  
						
						... 
						
						
						
						Return was left out of the decoder, so it didn't work. 
						
						
					 
					
						2023-03-08 10:16:19 +02:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						0d8fbd11ae 
					 
					
						
						
							
							Merge pull request  #195  from drisspg/enable_sdpa_with_nonzero_dropout  
						
						... 
						
						
						
						Enable sdpa for nonzero dropout 
						
						
					 
					
						2023-03-06 21:47:20 -08:00 
						 
				 
			
				
					
						
							
							
								Driss Guessous 
							
						 
					 
					
						
						
							
						
						6170531b8a 
					 
					
						
						
							
							enable sdpa for nonzero dropout  
						
						
						
						
					 
					
						2023-03-05 19:29:29 +00:00 
						 
				 
			
				
					
						
							
							
								Andrej 
							
						 
					 
					
						
						
							
						
						ae3a8d5fdd 
					 
					
						
						
							
							Merge pull request  #145  from otaviogood/gradAccumStability  
						
						... 
						
						
						
						fix for training stability on single GPU 
						
						
					 
					
						2023-02-14 18:48:54 -08:00 
						 
				 
			
				
					
						
							
							
								Lutz Roeder 
							
						 
					 
					
						
						
							
						
						10046a2ec0 
					 
					
						
						
							
							Add .gitignore  
						
						
						
						
					 
					
						2023-02-13 13:57:20 -08:00 
						 
				 
			
				
					
						
							
							
								Otavio Good 
							
						 
					 
					
						
						
							
						
						086ebe1822 
					 
					
						
						
							
							fix for training stability on single GPU  
						
						
						
						
					 
					
						2023-02-13 10:42:44 -08:00 
						 
				 
			
				
					
						
							
							
								kovkev 
							
						 
					 
					
						
						
							
						
						c2531159c7 
					 
					
						
						
							
							Fix the position of a comma  
						
						
						
						
					 
					
						2023-02-11 17:13:24 -08:00