Kevin Slagle
5156fef93c
fix np.memmap memory leak
...
nn.memmap doesn't free memory that it accesses. Thus, the entire dataset gets stored in RAM as the dataset has been fully accessed. The simplest workaround on stackoverflow is to just recreate the memmap for each batch. The extra overhead is negligible.
https://stackoverflow.com/questions/45132940/numpy-memmap-memory-usage-want-to-iterate-once/61472122#61472122
2024-01-25 11:41:01 -08:00
Andrej
eba36e8464
Merge pull request #309 from ho2103/master
...
Fix AssertionError on macOS - need to check CUDA availability for bf16
2023-06-22 08:24:17 -07:00
o
1eaceae193
Fix AssertionError on macOS - need to check CUDA availability for bf16
2023-06-19 18:05:09 -04:00
Andrej
4eb7a96b07
Merge pull request #305 from okuvshynov/fix_osx_dataload
...
nanogpt: fix multiprocessing in load_dataset on os x
2023-06-17 20:26:35 -07:00
Oleksandr Kuvshynov
542ac51d1f
nanogpt: fix multiprocessing in load_dataset on os x
...
The issue seems to be that _fixup_main_from_path in multiprocessing
module in python is unable to find entry point, thus, adding
```
if __name__ == '__main__'
```
2023-06-17 20:35:38 -04:00
Andrej
41d7014f7d
Merge pull request #301 from okuvshynov/master
...
[easy] allow multithreading in load_dataset
2023-06-16 18:30:03 -07:00
Oleksandr Kuvshynov
bb7e96754a
nanogpt: allow multithreading in load dataset
2023-06-16 20:00:17 -04:00
Andrej Karpathy
7339b904ef
use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix
2023-06-14 23:33:07 +00:00
Andrej
f08abb45bd
Merge pull request #274 from apivovarov/gelu
...
Use nn.GELU - 1.27x faster training
2023-06-14 16:25:15 -07:00
Andrej
18ee6b62b6
Merge pull request #275 from apivovarov/rm_unsqueeze
...
Remove pos unsqueeze(0)
2023-06-14 15:38:45 -07:00
Andrej
ed7887c888
Merge pull request #270 from LaihoE/master
...
fix np.sum overflows on windows
2023-06-14 15:36:26 -07:00
Andrej
8020bb582b
Merge pull request #276 from apivovarov/gitign
...
Add more files to .gitignore
2023-06-14 15:30:39 -07:00
Andrej
0f06d9b889
Merge pull request #277 from apivovarov/is_bf16_supported
...
Use bf16 only if supported
2023-06-14 15:29:50 -07:00
Andrej
cf4835ed6f
Merge pull request #286 from ctjlewis/master
...
docs: simplify dependencies installation
2023-06-14 15:21:04 -07:00
Lewis
eeac8732b9
docs: simplify dependencies installation
...
Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.
2023-05-31 23:04:08 -05:00
Alexander Pivovarov
eb33b8bf1c
Use bf16 only if supported
2023-05-17 03:26:48 +00:00
Alexander Pivovarov
b120c421bf
Add more files to .gitignore
2023-05-17 02:50:22 +00:00
Alexander Pivovarov
39ae397a93
Remove pos unsqueeze(0)
2023-05-17 02:30:18 +00:00
Alexander Pivovarov
594068e7ae
Use nn.GELU
2023-05-17 00:53:35 +00:00
Laiho
6649b299eb
np.sum overflows on windows
2023-05-09 16:36:59 +03:00
Andrej Karpathy
7fe4a099ad
simplify configure_optimizers by a lot
2023-05-06 14:40:28 +00:00
Andrej
196160b849
Merge pull request #247 from gnobre/macbook-run-instructions
...
Macbook run instructions
2023-04-17 20:16:31 -07:00
Andrej
21f9bff7e4
Merge pull request #225 from otaviogood/grad_accum
...
Fix for gradient_accumulation_steps training slow
2023-04-17 20:11:25 -07:00
Andrej
a6a708c7f1
Merge branch 'master' into grad_accum
2023-04-17 20:11:00 -07:00
Guilherme Nobre
e30c8fda23
Merge branch 'karpathy:master' into macbook-run-instructions
2023-04-15 09:50:58 +01:00
Guilherme
4732c43af3
add macbook specific instructions to generate samples
2023-04-15 09:49:38 +01:00
Andrej
d9f4735f5e
Merge pull request #10 from LaihoE/master
...
batch file write
2023-04-13 00:39:41 -07:00
Andrej
b288f4cfb2
Merge pull request #146 from lutzroeder/master
...
Add .gitignore
2023-04-12 22:48:37 -07:00
Andrej
079df20748
Merge pull request #74 from venusatuluri/fix_decode
...
Small fix to decode fn in shakespeare_char/prepare.py
2023-04-12 22:45:01 -07:00
Andrej
01e48ec1ab
Merge pull request #240 from YassineYousfi/master
...
don't dropout in eval mode
2023-04-12 22:43:59 -07:00
Andrej
7840a66859
Merge pull request #54 from MicroPanda123/luv
...
Give tqdm some love :)
2023-04-12 22:25:18 -07:00
Andrej
8abe215fba
Merge pull request #128 from abrahamsangha/fix-typo
...
fix typo
2023-04-12 22:24:41 -07:00
Andrej
ad62003d7a
Merge pull request #142 from kovkev/patch-1
...
Fix the position of a comma
2023-04-12 22:24:06 -07:00
Andrej
ea24604b29
Merge pull request #220 from python273/patch-1
...
Fix GPT.crop_block_size when flash attention is available
2023-04-12 22:13:01 -07:00
Andrej
8aeea6d970
Merge pull request #224 from SnehalRaj/patch-1
...
fix small typo
2023-04-12 22:12:26 -07:00
Andrej
2457471c9c
Merge pull request #236 from ymurenko/master
...
fix "cuda out of memory" when resuming training
2023-04-12 22:09:42 -07:00
Andrej Karpathy
553f949f46
fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping
2023-04-13 04:59:11 +00:00
Yassine Yousfi
7399dfe39d
dont always dropout!
2023-04-10 22:56:22 -07:00
ymurenko
4ac2e8ce3a
fix "cuda out of memory" when resuming training
2023-04-05 17:28:55 -04:00
Snehal Raj
c58fc4605c
fix small typo
2023-03-25 20:36:46 +01:00
Otavio Good
978d4fe538
Fix for gradient_accumulation_steps training slow
2023-03-25 00:04:45 -07:00
Kirill
c3f254844d
Fix GPT.crop_block_size when flash attention is available
2023-03-24 14:51:02 +03:00
Andrej
a82b33b525
Merge pull request #199 from ChristianOrr/patch-1
...
bugfix in decode function
2023-03-12 13:40:20 -07:00
Christian Orr
36c7db8c44
bugfix in decode function
...
Return was left out of the decoder, so it didn't work.
2023-03-08 10:16:19 +02:00
Andrej
0d8fbd11ae
Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout
...
Enable sdpa for nonzero dropout
2023-03-06 21:47:20 -08:00
Driss Guessous
6170531b8a
enable sdpa for nonzero dropout
2023-03-05 19:29:29 +00:00
Andrej
ae3a8d5fdd
Merge pull request #145 from otaviogood/gradAccumStability
...
fix for training stability on single GPU
2023-02-14 18:48:54 -08:00
Lutz Roeder
10046a2ec0
Add .gitignore
2023-02-13 13:57:20 -08:00
Otavio Good
086ebe1822
fix for training stability on single GPU
2023-02-13 10:42:44 -08:00
kovkev
c2531159c7
Fix the position of a comma
2023-02-11 17:13:24 -08:00