nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2025-04-27 21:23:10 +00:00

Author	SHA1	Message	Date
Adam Isakov	f35dc82437	fix: prepare.py - added input file opening in UTF-8 encoding	2024-01-26 01:34:44 +03:00
Adam Isakov	b7e194a756	feature: .gitignore - added venv folders	2024-01-26 01:10:10 +03:00
Andrej	eba36e8464	Merge pull request #309 from ho2103/master Fix AssertionError on macOS - need to check CUDA availability for bf16	2023-06-22 08:24:17 -07:00
o	1eaceae193	Fix AssertionError on macOS - need to check CUDA availability for bf16	2023-06-19 18:05:09 -04:00
Andrej	4eb7a96b07	Merge pull request #305 from okuvshynov/fix_osx_dataload nanogpt: fix multiprocessing in load_dataset on os x	2023-06-17 20:26:35 -07:00
Oleksandr Kuvshynov	542ac51d1f	nanogpt: fix multiprocessing in load_dataset on os x The issue seems to be that _fixup_main_from_path in multiprocessing module in python is unable to find entry point, thus, adding ``` if __name__ == '__main__' ```	2023-06-17 20:35:38 -04:00
Andrej	41d7014f7d	Merge pull request #301 from okuvshynov/master [easy] allow multithreading in load_dataset	2023-06-16 18:30:03 -07:00
Oleksandr Kuvshynov	bb7e96754a	nanogpt: allow multithreading in load dataset	2023-06-16 20:00:17 -04:00
Andrej Karpathy	7339b904ef	use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix	2023-06-14 23:33:07 +00:00
Andrej	f08abb45bd	Merge pull request #274 from apivovarov/gelu Use nn.GELU - 1.27x faster training	2023-06-14 16:25:15 -07:00
Andrej	18ee6b62b6	Merge pull request #275 from apivovarov/rm_unsqueeze Remove pos unsqueeze(0)	2023-06-14 15:38:45 -07:00
Andrej	ed7887c888	Merge pull request #270 from LaihoE/master fix np.sum overflows on windows	2023-06-14 15:36:26 -07:00
Andrej	8020bb582b	Merge pull request #276 from apivovarov/gitign Add more files to .gitignore	2023-06-14 15:30:39 -07:00
Andrej	0f06d9b889	Merge pull request #277 from apivovarov/is_bf16_supported Use bf16 only if supported	2023-06-14 15:29:50 -07:00
Andrej	cf4835ed6f	Merge pull request #286 from ctjlewis/master docs: simplify dependencies installation	2023-06-14 15:21:04 -07:00
Lewis	eeac8732b9	docs: simplify dependencies installation Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.	2023-05-31 23:04:08 -05:00
Alexander Pivovarov	eb33b8bf1c	Use bf16 only if supported	2023-05-17 03:26:48 +00:00
Alexander Pivovarov	b120c421bf	Add more files to .gitignore	2023-05-17 02:50:22 +00:00
Alexander Pivovarov	39ae397a93	Remove pos unsqueeze(0)	2023-05-17 02:30:18 +00:00
Alexander Pivovarov	594068e7ae	Use nn.GELU	2023-05-17 00:53:35 +00:00
Laiho	6649b299eb	np.sum overflows on windows	2023-05-09 16:36:59 +03:00
Andrej Karpathy	7fe4a099ad	simplify configure_optimizers by a lot	2023-05-06 14:40:28 +00:00
Andrej	196160b849	Merge pull request #247 from gnobre/macbook-run-instructions Macbook run instructions	2023-04-17 20:16:31 -07:00
Andrej	21f9bff7e4	Merge pull request #225 from otaviogood/grad_accum Fix for gradient_accumulation_steps training slow	2023-04-17 20:11:25 -07:00
Andrej	a6a708c7f1	Merge branch 'master' into grad_accum	2023-04-17 20:11:00 -07:00
Guilherme Nobre	e30c8fda23	Merge branch 'karpathy:master' into macbook-run-instructions	2023-04-15 09:50:58 +01:00
Guilherme	4732c43af3	add macbook specific instructions to generate samples	2023-04-15 09:49:38 +01:00
Andrej	d9f4735f5e	Merge pull request #10 from LaihoE/master batch file write	2023-04-13 00:39:41 -07:00
Andrej	b288f4cfb2	Merge pull request #146 from lutzroeder/master Add .gitignore	2023-04-12 22:48:37 -07:00
Andrej	079df20748	Merge pull request #74 from venusatuluri/fix_decode Small fix to decode fn in shakespeare_char/prepare.py	2023-04-12 22:45:01 -07:00
Andrej	01e48ec1ab	Merge pull request #240 from YassineYousfi/master don't dropout in eval mode	2023-04-12 22:43:59 -07:00
Andrej	7840a66859	Merge pull request #54 from MicroPanda123/luv Give tqdm some love :)	2023-04-12 22:25:18 -07:00
Andrej	8abe215fba	Merge pull request #128 from abrahamsangha/fix-typo fix typo	2023-04-12 22:24:41 -07:00
Andrej	ad62003d7a	Merge pull request #142 from kovkev/patch-1 Fix the position of a comma	2023-04-12 22:24:06 -07:00
Andrej	ea24604b29	Merge pull request #220 from python273/patch-1 Fix GPT.crop_block_size when flash attention is available	2023-04-12 22:13:01 -07:00
Andrej	8aeea6d970	Merge pull request #224 from SnehalRaj/patch-1 fix small typo	2023-04-12 22:12:26 -07:00
Andrej	2457471c9c	Merge pull request #236 from ymurenko/master fix "cuda out of memory" when resuming training	2023-04-12 22:09:42 -07:00
Andrej Karpathy	553f949f46	fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping	2023-04-13 04:59:11 +00:00
Yassine Yousfi	7399dfe39d	dont always dropout!	2023-04-10 22:56:22 -07:00
ymurenko	4ac2e8ce3a	fix "cuda out of memory" when resuming training	2023-04-05 17:28:55 -04:00
Snehal Raj	c58fc4605c	fix small typo	2023-03-25 20:36:46 +01:00
Otavio Good	978d4fe538	Fix for gradient_accumulation_steps training slow	2023-03-25 00:04:45 -07:00
Kirill	c3f254844d	Fix GPT.crop_block_size when flash attention is available	2023-03-24 14:51:02 +03:00
Andrej	a82b33b525	Merge pull request #199 from ChristianOrr/patch-1 bugfix in decode function	2023-03-12 13:40:20 -07:00
Christian Orr	36c7db8c44	bugfix in decode function Return was left out of the decoder, so it didn't work.	2023-03-08 10:16:19 +02:00
Andrej	0d8fbd11ae	Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout Enable sdpa for nonzero dropout	2023-03-06 21:47:20 -08:00
Driss Guessous	6170531b8a	enable sdpa for nonzero dropout	2023-03-05 19:29:29 +00:00
Andrej	ae3a8d5fdd	Merge pull request #145 from otaviogood/gradAccumStability fix for training stability on single GPU	2023-02-14 18:48:54 -08:00
Lutz Roeder	10046a2ec0	Add .gitignore	2023-02-13 13:57:20 -08:00
Otavio Good	086ebe1822	fix for training stability on single GPU	2023-02-13 10:42:44 -08:00

1 2 3 4

197 Commits