Andrej
cf4835ed6f
Merge pull request #286 from ctjlewis/master
...
docs: simplify dependencies installation
2023-06-14 15:21:04 -07:00
Lewis
eeac8732b9
docs: simplify dependencies installation
...
Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.
2023-05-31 23:04:08 -05:00
Andrej Karpathy
7fe4a099ad
simplify configure_optimizers by a lot
2023-05-06 14:40:28 +00:00
Andrej
196160b849
Merge pull request #247 from gnobre/macbook-run-instructions
...
Macbook run instructions
2023-04-17 20:16:31 -07:00
Andrej
21f9bff7e4
Merge pull request #225 from otaviogood/grad_accum
...
Fix for gradient_accumulation_steps training slow
2023-04-17 20:11:25 -07:00
Andrej
a6a708c7f1
Merge branch 'master' into grad_accum
2023-04-17 20:11:00 -07:00
Guilherme Nobre
e30c8fda23
Merge branch 'karpathy:master' into macbook-run-instructions
2023-04-15 09:50:58 +01:00
Guilherme
4732c43af3
add macbook specific instructions to generate samples
2023-04-15 09:49:38 +01:00
Andrej
d9f4735f5e
Merge pull request #10 from LaihoE/master
...
batch file write
2023-04-13 00:39:41 -07:00
Andrej
b288f4cfb2
Merge pull request #146 from lutzroeder/master
...
Add .gitignore
2023-04-12 22:48:37 -07:00
Andrej
079df20748
Merge pull request #74 from venusatuluri/fix_decode
...
Small fix to decode fn in shakespeare_char/prepare.py
2023-04-12 22:45:01 -07:00
Andrej
01e48ec1ab
Merge pull request #240 from YassineYousfi/master
...
don't dropout in eval mode
2023-04-12 22:43:59 -07:00
Andrej
7840a66859
Merge pull request #54 from MicroPanda123/luv
...
Give tqdm some love :)
2023-04-12 22:25:18 -07:00
Andrej
8abe215fba
Merge pull request #128 from abrahamsangha/fix-typo
...
fix typo
2023-04-12 22:24:41 -07:00
Andrej
ad62003d7a
Merge pull request #142 from kovkev/patch-1
...
Fix the position of a comma
2023-04-12 22:24:06 -07:00
Andrej
ea24604b29
Merge pull request #220 from python273/patch-1
...
Fix GPT.crop_block_size when flash attention is available
2023-04-12 22:13:01 -07:00
Andrej
8aeea6d970
Merge pull request #224 from SnehalRaj/patch-1
...
fix small typo
2023-04-12 22:12:26 -07:00
Andrej
2457471c9c
Merge pull request #236 from ymurenko/master
...
fix "cuda out of memory" when resuming training
2023-04-12 22:09:42 -07:00
Andrej Karpathy
553f949f46
fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping
2023-04-13 04:59:11 +00:00
Yassine Yousfi
7399dfe39d
dont always dropout!
2023-04-10 22:56:22 -07:00
ymurenko
4ac2e8ce3a
fix "cuda out of memory" when resuming training
2023-04-05 17:28:55 -04:00
Snehal Raj
c58fc4605c
fix small typo
2023-03-25 20:36:46 +01:00
Otavio Good
978d4fe538
Fix for gradient_accumulation_steps training slow
2023-03-25 00:04:45 -07:00
Kirill
c3f254844d
Fix GPT.crop_block_size when flash attention is available
2023-03-24 14:51:02 +03:00
Andrej
a82b33b525
Merge pull request #199 from ChristianOrr/patch-1
...
bugfix in decode function
2023-03-12 13:40:20 -07:00
Christian Orr
36c7db8c44
bugfix in decode function
...
Return was left out of the decoder, so it didn't work.
2023-03-08 10:16:19 +02:00
Andrej
0d8fbd11ae
Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout
...
Enable sdpa for nonzero dropout
2023-03-06 21:47:20 -08:00
Driss Guessous
6170531b8a
enable sdpa for nonzero dropout
2023-03-05 19:29:29 +00:00
Andrej
ae3a8d5fdd
Merge pull request #145 from otaviogood/gradAccumStability
...
fix for training stability on single GPU
2023-02-14 18:48:54 -08:00
Lutz Roeder
10046a2ec0
Add .gitignore
2023-02-13 13:57:20 -08:00
Otavio Good
086ebe1822
fix for training stability on single GPU
2023-02-13 10:42:44 -08:00
kovkev
c2531159c7
Fix the position of a comma
2023-02-11 17:13:24 -08:00
Andrej Karpathy
55c5069696
fix misinformation in readme
2023-02-10 16:34:46 +00:00
Andrej Karpathy
e58f0cfa94
oops i should not be needing or multiplying by world_size to calculate mfu
2023-02-07 21:38:39 +00:00
Abraham Sangha
27a5d6f123
fix typos
2023-02-07 11:02:20 -07:00
Andrej Karpathy
8b1e43209e
small tweaks, make default WD be 0.1 as is often cited, and remove spurious init of LayerNorm, which is already initialized at 1,0
2023-02-06 23:07:25 +00:00
Andrej Karpathy
ab21d6c15d
bugfix we have to call the raw_model's estimate_mfu ty @jprobichaud for original PR
2023-02-06 19:55:35 +00:00
Andrej Karpathy
f83dd034e1
also add a sampling/inference section
2023-02-05 21:02:30 +00:00
Andrej Karpathy
23a8e701d2
revamp the readme file to be a bit better and more accessible, i hope
2023-02-05 19:31:32 +00:00
Andrej Karpathy
fce706cbe6
tune the hyperparams a bit, in configs
2023-02-05 19:31:18 +00:00
Andrej Karpathy
ab0718a7dd
add the estimation of model flops utilization (MFU), a very commonly looked at metric that estimates the token throughput in units of A100 bfloat16 peak flops (312 TFLOPS). this gives us a sense of the hardware utilization we're achieving
2023-02-05 00:48:58 +00:00
Andrej Karpathy
580902617c
oops optimizer now demands to know device_type
2023-02-05 00:43:15 +00:00
Andrej Karpathy
34720df284
make more accurate the way in which we count parameters. previous count incorrectly included the positional encoding params, when typically only the number of weight parameters is reported for these models
2023-02-04 23:51:18 +00:00
Andrej Karpathy
3341b4cecc
oops forgot to subtract embedding params, which don't enter the 6ND equation
2023-02-04 22:33:35 +00:00
Andrej Karpathy
5a162bc773
fix silly error, i don't want to confuse a future GPT training on this notebook in the future
2023-02-04 22:11:16 +00:00
Andrej Karpathy
0bb96d3fff
add reference for 6ND to notebook too
2023-02-04 22:07:32 +00:00
Andrej Karpathy
eae986c2d2
new notebook with a bunch of calculations related to flops and memory of Transformer
2023-02-04 22:02:53 +00:00
Andrej Karpathy
a74e8363a2
clean up TODOs a bit, they are stale
2023-02-04 21:11:25 +00:00
Andrej Karpathy
25d95dbd65
mildly dramatic refactor for handing all these usage cases across all possible supported and unsupported devices for all the possible switches and flags
2023-02-04 21:06:17 +00:00
Andrej Karpathy
e108ffb973
very slight refactor, bit cleaner
2023-02-04 19:34:24 +00:00