1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00
Commit Graph

200 Commits

Author SHA1 Message Date
Andrej
a022d02ee2
Merge pull request #429 from adambala/fixes
Open "shakespeare" data in UTF-8 in "prepare.py"
2024-02-27 09:05:44 -08:00
Andrej
f68ac2200d
Merge pull request #428 from kjslag/memmap-memory-leak
fix np.memmap memory leak
2024-02-27 08:41:24 -08:00
Adam Isakov
f35dc82437 fix: prepare.py - added input file opening in UTF-8 encoding 2024-01-26 01:34:44 +03:00
Adam Isakov
b7e194a756 feature: .gitignore - added venv folders 2024-01-26 01:10:10 +03:00
Kevin Slagle
5156fef93c
fix np.memmap memory leak
nn.memmap doesn't free memory that it accesses. Thus, the entire dataset gets stored in RAM as the dataset has been fully accessed. The simplest workaround on stackoverflow is to just recreate the memmap for each batch. The extra overhead is negligible.

https://stackoverflow.com/questions/45132940/numpy-memmap-memory-usage-want-to-iterate-once/61472122#61472122
2024-01-25 11:41:01 -08:00
Andrej
eba36e8464
Merge pull request #309 from ho2103/master
Fix AssertionError on macOS - need to check CUDA availability for bf16
2023-06-22 08:24:17 -07:00
o
1eaceae193 Fix AssertionError on macOS - need to check CUDA availability for bf16 2023-06-19 18:05:09 -04:00
Andrej
4eb7a96b07
Merge pull request #305 from okuvshynov/fix_osx_dataload
nanogpt: fix multiprocessing in load_dataset on os x
2023-06-17 20:26:35 -07:00
Oleksandr Kuvshynov
542ac51d1f nanogpt: fix multiprocessing in load_dataset on os x
The issue seems to be that _fixup_main_from_path in multiprocessing
module in python is unable to find entry point, thus, adding
```
if __name__ == '__main__'
```
2023-06-17 20:35:38 -04:00
Andrej
41d7014f7d
Merge pull request #301 from okuvshynov/master
[easy] allow multithreading in load_dataset
2023-06-16 18:30:03 -07:00
Oleksandr Kuvshynov
bb7e96754a nanogpt: allow multithreading in load dataset 2023-06-16 20:00:17 -04:00
Andrej Karpathy
7339b904ef use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix 2023-06-14 23:33:07 +00:00
Andrej
f08abb45bd
Merge pull request #274 from apivovarov/gelu
Use nn.GELU - 1.27x faster training
2023-06-14 16:25:15 -07:00
Andrej
18ee6b62b6
Merge pull request #275 from apivovarov/rm_unsqueeze
Remove pos unsqueeze(0)
2023-06-14 15:38:45 -07:00
Andrej
ed7887c888
Merge pull request #270 from LaihoE/master
fix np.sum overflows on windows
2023-06-14 15:36:26 -07:00
Andrej
8020bb582b
Merge pull request #276 from apivovarov/gitign
Add more files to .gitignore
2023-06-14 15:30:39 -07:00
Andrej
0f06d9b889
Merge pull request #277 from apivovarov/is_bf16_supported
Use bf16 only if supported
2023-06-14 15:29:50 -07:00
Andrej
cf4835ed6f
Merge pull request #286 from ctjlewis/master
docs: simplify dependencies installation
2023-06-14 15:21:04 -07:00
Lewis
eeac8732b9
docs: simplify dependencies installation
Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.
2023-05-31 23:04:08 -05:00
Alexander Pivovarov
eb33b8bf1c Use bf16 only if supported 2023-05-17 03:26:48 +00:00
Alexander Pivovarov
b120c421bf Add more files to .gitignore 2023-05-17 02:50:22 +00:00
Alexander Pivovarov
39ae397a93 Remove pos unsqueeze(0) 2023-05-17 02:30:18 +00:00
Alexander Pivovarov
594068e7ae Use nn.GELU 2023-05-17 00:53:35 +00:00
Laiho
6649b299eb np.sum overflows on windows 2023-05-09 16:36:59 +03:00
Andrej Karpathy
7fe4a099ad simplify configure_optimizers by a lot 2023-05-06 14:40:28 +00:00
Andrej
196160b849
Merge pull request #247 from gnobre/macbook-run-instructions
Macbook run instructions
2023-04-17 20:16:31 -07:00
Andrej
21f9bff7e4
Merge pull request #225 from otaviogood/grad_accum
Fix for gradient_accumulation_steps training slow
2023-04-17 20:11:25 -07:00
Andrej
a6a708c7f1
Merge branch 'master' into grad_accum 2023-04-17 20:11:00 -07:00
Guilherme Nobre
e30c8fda23
Merge branch 'karpathy:master' into macbook-run-instructions 2023-04-15 09:50:58 +01:00
Guilherme
4732c43af3 add macbook specific instructions to generate samples 2023-04-15 09:49:38 +01:00
Andrej
d9f4735f5e
Merge pull request #10 from LaihoE/master
batch file write
2023-04-13 00:39:41 -07:00
Andrej
b288f4cfb2
Merge pull request #146 from lutzroeder/master
Add .gitignore
2023-04-12 22:48:37 -07:00
Andrej
079df20748
Merge pull request #74 from venusatuluri/fix_decode
Small fix to decode fn in shakespeare_char/prepare.py
2023-04-12 22:45:01 -07:00
Andrej
01e48ec1ab
Merge pull request #240 from YassineYousfi/master
don't dropout in eval mode
2023-04-12 22:43:59 -07:00
Andrej
7840a66859
Merge pull request #54 from MicroPanda123/luv
Give tqdm some love :)
2023-04-12 22:25:18 -07:00
Andrej
8abe215fba
Merge pull request #128 from abrahamsangha/fix-typo
fix typo
2023-04-12 22:24:41 -07:00
Andrej
ad62003d7a
Merge pull request #142 from kovkev/patch-1
Fix the position of a comma
2023-04-12 22:24:06 -07:00
Andrej
ea24604b29
Merge pull request #220 from python273/patch-1
Fix GPT.crop_block_size when flash attention is available
2023-04-12 22:13:01 -07:00
Andrej
8aeea6d970
Merge pull request #224 from SnehalRaj/patch-1
fix small typo
2023-04-12 22:12:26 -07:00
Andrej
2457471c9c
Merge pull request #236 from ymurenko/master
fix "cuda out of memory" when resuming training
2023-04-12 22:09:42 -07:00
Andrej Karpathy
553f949f46 fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping 2023-04-13 04:59:11 +00:00
Yassine Yousfi
7399dfe39d dont always dropout! 2023-04-10 22:56:22 -07:00
ymurenko
4ac2e8ce3a fix "cuda out of memory" when resuming training 2023-04-05 17:28:55 -04:00
Snehal Raj
c58fc4605c
fix small typo 2023-03-25 20:36:46 +01:00
Otavio Good
978d4fe538 Fix for gradient_accumulation_steps training slow 2023-03-25 00:04:45 -07:00
Kirill
c3f254844d
Fix GPT.crop_block_size when flash attention is available 2023-03-24 14:51:02 +03:00
Andrej
a82b33b525
Merge pull request #199 from ChristianOrr/patch-1
bugfix in decode function
2023-03-12 13:40:20 -07:00
Christian Orr
36c7db8c44
bugfix in decode function
Return was left out of the decoder, so it didn't work.
2023-03-08 10:16:19 +02:00
Andrej
0d8fbd11ae
Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout
Enable sdpa for nonzero dropout
2023-03-06 21:47:20 -08:00
Driss Guessous
6170531b8a enable sdpa for nonzero dropout 2023-03-05 19:29:29 +00:00