1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-09-21 03:39:44 +00:00

Commit Graph

  • 46c68ad1a2 nondeterminism test master osmarks 2024-07-23 11:48:41 +0100
  • 41ec7b3313 add image osmarks 2024-07-23 10:56:52 +0100
  • a225b756e8 fix things osmarks 2024-07-23 10:56:47 +0100
  • a64b2f2cfe tests osmarks 2024-07-08 19:36:49 +0100
  • f3118fe74d Add note about fix osmarks 2024-06-24 20:13:10 +0100
  • 0194d45e43 experiments osmarks 2024-06-24 19:10:15 +0000
  • 9755682b98
    Merge pull request #463 from goswamig/test1 Andrej 2024-06-03 09:51:52 -0700
  • 3ab86ce851
    Merge branch 'master' into test1 Andrej 2024-06-03 09:50:45 -0700
  • 7c7e627108
    Merge pull request #487 from jellehak/patch-1 Andrej 2024-06-03 09:47:17 -0700
  • 5cb16fe66a
    Update README.md Jelle Hak 2024-05-28 08:38:35 +0200
  • 1ab9ec1b83 Fixing eval path in README Gautam Kumar 2024-03-23 23:51:02 -0700
  • 325be85d9b
    Merge pull request #420 from vinjn/fix-371-enc-is-not-defined Andrej 2024-02-27 09:27:01 -0800
  • a022d02ee2
    Merge pull request #429 from adambala/fixes Andrej 2024-02-27 09:05:44 -0800
  • f68ac2200d
    Merge pull request #428 from kjslag/memmap-memory-leak Andrej 2024-02-27 08:41:24 -0800
  • f35dc82437 fix: prepare.py - added input file opening in UTF-8 encoding Adam Isakov 2024-01-26 01:34:44 +0300
  • b7e194a756 feature: .gitignore - added venv folders Adam Isakov 2024-01-26 01:10:10 +0300
  • 5156fef93c
    fix np.memmap memory leak Kevin Slagle 2024-01-25 11:41:01 -0800
  • dccf362c2b Move enc to gloabal namespace vinjn 2024-01-12 12:47:42 -0800
  • eba36e8464
    Merge pull request #309 from ho2103/master Andrej 2023-06-22 08:24:17 -0700
  • 1eaceae193 Fix AssertionError on macOS - need to check CUDA availability for bf16 o 2023-06-19 18:05:09 -0400
  • 4eb7a96b07
    Merge pull request #305 from okuvshynov/fix_osx_dataload Andrej 2023-06-17 20:26:35 -0700
  • 542ac51d1f nanogpt: fix multiprocessing in load_dataset on os x Oleksandr Kuvshynov 2023-06-17 20:35:38 -0400
  • 41d7014f7d
    Merge pull request #301 from okuvshynov/master Andrej 2023-06-16 18:30:03 -0700
  • bb7e96754a nanogpt: allow multithreading in load dataset Oleksandr Kuvshynov 2023-06-16 20:00:17 -0400
  • 7339b904ef use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix Andrej Karpathy 2023-06-14 23:33:07 +0000
  • f08abb45bd
    Merge pull request #274 from apivovarov/gelu Andrej 2023-06-14 16:25:15 -0700
  • 18ee6b62b6
    Merge pull request #275 from apivovarov/rm_unsqueeze Andrej 2023-06-14 15:38:45 -0700
  • ed7887c888
    Merge pull request #270 from LaihoE/master Andrej 2023-06-14 15:36:26 -0700
  • 8020bb582b
    Merge pull request #276 from apivovarov/gitign Andrej 2023-06-14 15:30:39 -0700
  • 0f06d9b889
    Merge pull request #277 from apivovarov/is_bf16_supported Andrej 2023-06-14 15:29:50 -0700
  • cf4835ed6f
    Merge pull request #286 from ctjlewis/master Andrej 2023-06-14 15:21:04 -0700
  • eeac8732b9
    docs: simplify dependencies installation Lewis 2023-05-31 23:04:08 -0500
  • eb33b8bf1c Use bf16 only if supported Alexander Pivovarov 2023-05-17 03:26:48 +0000
  • b120c421bf Add more files to .gitignore Alexander Pivovarov 2023-05-17 02:44:21 +0000
  • 39ae397a93 Remove pos unsqueeze(0) Alexander Pivovarov 2023-05-17 02:30:18 +0000
  • 594068e7ae Use nn.GELU Alexander Pivovarov 2023-05-17 00:53:35 +0000
  • 6649b299eb np.sum overflows on windows Laiho 2023-05-09 16:36:59 +0300
  • 7fe4a099ad simplify configure_optimizers by a lot Andrej Karpathy 2023-05-06 14:40:28 +0000
  • 196160b849
    Merge pull request #247 from gnobre/macbook-run-instructions Andrej 2023-04-17 20:16:31 -0700
  • 21f9bff7e4
    Merge pull request #225 from otaviogood/grad_accum Andrej 2023-04-17 20:11:25 -0700
  • a6a708c7f1
    Merge branch 'master' into grad_accum Andrej 2023-04-17 20:11:00 -0700
  • e30c8fda23
    Merge branch 'karpathy:master' into macbook-run-instructions Guilherme Nobre 2023-04-15 09:50:58 +0100
  • 4732c43af3 add macbook specific instructions to generate samples Guilherme 2023-04-15 09:49:38 +0100
  • d9f4735f5e
    Merge pull request #10 from LaihoE/master Andrej 2023-04-13 00:39:41 -0700
  • b288f4cfb2
    Merge pull request #146 from lutzroeder/master Andrej 2023-04-12 22:48:37 -0700
  • 079df20748
    Merge pull request #74 from venusatuluri/fix_decode Andrej 2023-04-12 22:45:01 -0700
  • 01e48ec1ab
    Merge pull request #240 from YassineYousfi/master Andrej 2023-04-12 22:43:59 -0700
  • 7840a66859
    Merge pull request #54 from MicroPanda123/luv Andrej 2023-04-12 22:25:18 -0700
  • 8abe215fba
    Merge pull request #128 from abrahamsangha/fix-typo Andrej 2023-04-12 22:24:41 -0700
  • ad62003d7a
    Merge pull request #142 from kovkev/patch-1 Andrej 2023-04-12 22:24:06 -0700
  • ea24604b29
    Merge pull request #220 from python273/patch-1 Andrej 2023-04-12 22:13:01 -0700
  • 8aeea6d970
    Merge pull request #224 from SnehalRaj/patch-1 Andrej 2023-04-12 22:12:26 -0700
  • 2457471c9c
    Merge pull request #236 from ymurenko/master Andrej 2023-04-12 22:09:42 -0700
  • 553f949f46 fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping Andrej Karpathy 2023-04-13 04:59:11 +0000
  • 7399dfe39d dont always dropout! Yassine Yousfi 2023-04-10 22:56:22 -0700
  • 4ac2e8ce3a fix "cuda out of memory" when resuming training ymurenko 2023-04-05 17:28:55 -0400
  • c58fc4605c
    fix small typo Snehal Raj 2023-03-25 20:36:46 +0100
  • 978d4fe538 Fix for gradient_accumulation_steps training slow Otavio Good 2023-03-25 00:04:45 -0700
  • c3f254844d
    Fix GPT.crop_block_size when flash attention is available Kirill 2023-03-24 14:51:02 +0300
  • a82b33b525
    Merge pull request #199 from ChristianOrr/patch-1 Andrej 2023-03-12 13:40:20 -0700
  • 36c7db8c44
    bugfix in decode function Christian Orr 2023-03-08 10:16:19 +0200
  • 0d8fbd11ae
    Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout Andrej 2023-03-06 21:47:20 -0800
  • 6170531b8a enable sdpa for nonzero dropout Driss Guessous 2023-03-05 19:29:29 +0000
  • ae3a8d5fdd
    Merge pull request #145 from otaviogood/gradAccumStability Andrej 2023-02-14 18:48:54 -0800
  • 10046a2ec0 Add .gitignore Lutz Roeder 2023-02-13 13:57:20 -0800
  • 086ebe1822 fix for training stability on single GPU Otavio Good 2023-02-13 10:42:44 -0800
  • c2531159c7
    Fix the position of a comma kovkev 2023-02-11 17:13:24 -0800
  • 55c5069696 fix misinformation in readme Andrej Karpathy 2023-02-10 16:34:46 +0000
  • e58f0cfa94 oops i should not be needing or multiplying by world_size to calculate mfu Andrej Karpathy 2023-02-07 21:38:39 +0000
  • 27a5d6f123 fix typos Abraham Sangha 2023-02-07 11:02:20 -0700
  • 8b1e43209e small tweaks, make default WD be 0.1 as is often cited, and remove spurious init of LayerNorm, which is already initialized at 1,0 Andrej Karpathy 2023-02-06 23:07:25 +0000
  • ab21d6c15d bugfix we have to call the raw_model's estimate_mfu ty @jprobichaud for original PR Andrej Karpathy 2023-02-06 19:55:35 +0000
  • f83dd034e1 also add a sampling/inference section Andrej Karpathy 2023-02-05 21:02:30 +0000
  • 23a8e701d2 revamp the readme file to be a bit better and more accessible, i hope Andrej Karpathy 2023-02-05 19:31:32 +0000
  • fce706cbe6 tune the hyperparams a bit, in configs Andrej Karpathy 2023-02-05 19:31:18 +0000
  • ab0718a7dd add the estimation of model flops utilization (MFU), a very commonly looked at metric that estimates the token throughput in units of A100 bfloat16 peak flops (312 TFLOPS). this gives us a sense of the hardware utilization we're achieving Andrej Karpathy 2023-02-05 00:48:58 +0000
  • 580902617c oops optimizer now demands to know device_type Andrej Karpathy 2023-02-05 00:43:15 +0000
  • 34720df284 make more accurate the way in which we count parameters. previous count incorrectly included the positional encoding params, when typically only the number of weight parameters is reported for these models Andrej Karpathy 2023-02-04 23:51:18 +0000
  • 3341b4cecc oops forgot to subtract embedding params, which don't enter the 6ND equation Andrej Karpathy 2023-02-04 22:33:35 +0000
  • 5a162bc773 fix silly error, i don't want to confuse a future GPT training on this notebook in the future Andrej Karpathy 2023-02-04 22:11:16 +0000
  • 0bb96d3fff add reference for 6ND to notebook too Andrej Karpathy 2023-02-04 22:07:32 +0000
  • eae986c2d2 new notebook with a bunch of calculations related to flops and memory of Transformer Andrej Karpathy 2023-02-04 22:02:53 +0000
  • a74e8363a2 clean up TODOs a bit, they are stale Andrej Karpathy 2023-02-04 21:11:25 +0000
  • 25d95dbd65 mildly dramatic refactor for handing all these usage cases across all possible supported and unsupported devices for all the possible switches and flags Andrej Karpathy 2023-02-04 21:06:17 +0000
  • e108ffb973 very slight refactor, bit cleaner Andrej Karpathy 2023-02-04 19:34:24 +0000
  • dc149891b6
    Merge pull request #120 from nynyg/remove_cpu_pin_mem Andrej 2023-02-04 11:28:08 -0800
  • b8286f343e Pin memory only when training on GPU Nan Yang 2023-02-04 11:16:26 -0800
  • 77e7e04c26 padding 50257 -> 50304 vocab_size, the nerest multiple of 64. the biggest deal smallest optimization i've made in recent past, about 25% faster. this is because the last layer is a major latency bottleneck consuming about 40% of latency due to the very high channel count. Andrej Karpathy 2023-02-04 16:06:18 +0000
  • b3c17c6c6a slight tweak compressing LOC Andrej Karpathy 2023-02-04 15:57:29 +0000
  • 53d56b82f1
    Merge pull request #116 from ramtingh/master Andrej 2023-02-04 07:42:32 -0800
  • 9da1627c7f
    Explicitly set ddp device Ramtin Gharleghi 2023-02-04 15:07:36 +1100
  • 3fd4c0c5ef who needs a dataloader? overlap the prefetching of the next batch with GPU compute, ehiding the data loading latency entirely. this saves about 1ms lol Andrej Karpathy 2023-02-04 02:52:48 +0000
  • 46428d3142
    Merge pull request #115 from akashmjn/akashmjn/fix-notebook-stats Andrej 2023-02-03 17:23:44 -0800
  • d9a73374ed
    keep only what's needed Akash Mahajan 2023-02-03 15:13:13 -0800
  • 3969860ff5 include launch command too. anyone should be able to do this now Andrej Karpathy 2023-02-03 22:17:05 +0000
  • f9348f3f18 add gpt2 training config Andrej Karpathy 2023-02-03 22:14:37 +0000
  • 0e2c12b5ae
    add template .gitattributes that fixes language stats Akash Mahajan 2023-02-03 13:36:36 -0800
  • e170e40872 use the new fused AdamW from pytorch nightly, if available Andrej Karpathy 2023-02-03 17:56:51 +0000
  • 7d44bdf6b5
    Merge pull request #106 from YassineYousfi/master Andrej 2023-02-02 17:23:22 -0800
  • 1e87509e47 if dropout > 0.0 disable Flash until pytorch fix. don't assert fail sigh Andrej Karpathy 2023-02-02 23:22:56 +0000