1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00
Commit Graph

65 Commits

Author SHA1 Message Date
Andrej Karpathy
cf99914886 add gradient accumulation support to simulate larger batch sizes. ty @VHellendoorn for original PR 2023-01-15 17:49:55 +00:00
Andrej Karpathy
89da79eee1 add note of caution for the produced warning, investigate later 2023-01-14 20:38:22 +00:00
Andrej Karpathy
7d7ded25ce a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think 2023-01-14 03:59:53 +00:00
Andrej Karpathy
91d02510ce fix bug... if topk > vocab_size, torch.topk will throw error 2023-01-14 03:57:00 +00:00
Andrej Karpathy
57735f532d correctly propagate the vocab_size from the rendered dataset into the model args 2023-01-14 02:26:44 +00:00
Andrej Karpathy
43b37fd568 reverse the order, making sure that the final layer init is preserved, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init 2023-01-14 02:16:10 +00:00
Andrej Karpathy
7c8288552b tie the weights of lm_head.weight and transformer.wte.weight, i.e. the last linear layer of decoder and the token embeddings. 2023-01-14 01:00:55 +00:00
Andrej Karpathy
32b4f08d9d it's true 2023-01-13 23:43:00 +00:00
Andrej Karpathy
3e0fd42579 more scaling laws, clarification, and add simple interpolation of Approach 2 2023-01-13 00:57:15 +00:00
Andrej Karpathy
8f85b83347 inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token 2023-01-12 06:02:50 +00:00
Andrej Karpathy
e21cbf887f meant to set always_save_checkpoint to False instead, so we only write when val improves 2023-01-12 05:47:34 +00:00
Andrej Karpathy
c1ac2d58f1 including transformers as a dependency of the repo as well 2023-01-12 02:42:38 +00:00
Andrej Karpathy
7f51d17977 add note about windows and pytorch 2.0 and torch compile in general 2023-01-12 02:17:52 +00:00
Andrej Karpathy
bb49751439 oh no nanoGPT is trending quickly explain the character-level functionality I added late last night 2023-01-11 17:11:15 +00:00
Andrej Karpathy
d17350a31d add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders. 2023-01-11 05:27:19 +00:00
Andrej Karpathy
c2a402f7f7 guess the config from globals() and log all of it with wandb 2023-01-11 01:00:22 +00:00
Andrej Karpathy
8b2e622b27 adjust the readme to reflect changes in the autocast branch 2023-01-08 19:40:46 +00:00
Andrej Karpathy
b77c2e86d3 copy pasting what seems to work to bench,sample as well. ty @lantiga 2023-01-08 19:32:13 +00:00
Andrej Karpathy
a855d316fd add device and dtype support to train.py args 2023-01-08 19:20:38 +00:00
Andrej
e7cd674ce7
Merge pull request #20 from lantiga/wandb-optional-import
Make wandb import conditioned to wandb_log=True
2023-01-08 10:19:40 -08:00
Luca Antiga
09f1f458e8 Move conditional import 2023-01-08 15:51:50 +01:00
Luca Antiga
aba47f0a35 Make wandb import conditioned to wandb_log=True 2023-01-08 15:42:08 +01:00
Andrej Karpathy
e53b9d28ff ran readme through spellchecker heh 2023-01-08 01:46:54 +00:00
Andrej Karpathy
df3b8a57ab tune the readme with new header image and the loss curve for 124M 2023-01-08 00:41:14 +00:00
Andrej Karpathy
d56bdf05a6 progress! based on chinchilla author correspondence 2023-01-07 02:42:30 +00:00
Andrej Karpathy
27fc6a4112 small tweaks to notebook 2023-01-06 02:13:04 +00:00
Andrej Karpathy
69d1a5f1af update scaling laws. basically i can't reproduce any of params, flops, or scaling laws of the Chinchilla paper atm... 2023-01-06 02:01:08 +00:00
Andrej Karpathy
9629093e53 minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle 2023-01-05 01:14:02 +00:00
Andrej
529c967a65
Merge pull request #19 from nat/patch-1
Strip unwanted prefix from state keys when loading model in sample.py
2023-01-04 16:46:32 -08:00
Andrej Karpathy
d562b3e550 shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks* 2023-01-05 00:44:35 +00:00
Nat Friedman
2b9e168736 Strip unwanted prefix from state keys when loading model 2023-01-04 16:39:30 -08:00
Andrej Karpathy
ab04701f9f mention current 8GPU SOTA and shuffle sections a bit 2023-01-04 18:59:10 +00:00
Andrej
1eefbb2520
Merge pull request #16 from jorahn/patch-1
Update README.md
2023-01-04 09:08:50 -08:00
Jonathan Rahn
26aa5f3ead
Update README.md 2023-01-04 10:28:13 +01:00
Andrej Karpathy
c72ecf5d93 add a notebook trying to reproduce chinchilla scaling laws. I can't get the numbers to be exactly right, have to look at more 2023-01-04 00:59:34 +00:00
Andrej Karpathy
5acba4b005 ty lambda labs 2023-01-03 21:16:07 +00:00
Andrej Karpathy
97fc42616e adding few more dependencies 2023-01-03 17:54:48 +00:00
Andrej Karpathy
9f95aca93e better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas 2023-01-03 17:45:49 +00:00
Andrej Karpathy
b45eec3e4b flesh out the remaining TODOs in readme a bit more 2023-01-03 07:41:28 +00:00
Andrej Karpathy
177d5f7dc5 disabling torch.jit.script here for massive performance boost when using torch.compile, our default. see issue #11. thanks @vgoklani for flagging 2023-01-02 23:05:01 +00:00
Andrej Karpathy
ea4de192e0 reshuffle args inside sample.py 2023-01-02 02:11:39 +00:00
Andrej Karpathy
ec9b1f8182 add a patch to fix mysterious unwanted prefix in state dict? maybe remove later 2023-01-02 01:25:02 +00:00
Andrej Karpathy
41184a27f5 rename compile_model to compile, shroter, version 2 stragglers 2023-01-02 01:15:55 +00:00
Andrej Karpathy
35f51974c4 rename to compile it's shorter 2023-01-02 01:14:46 +00:00
Andrej Karpathy
2febf4463c candidate changes to apis, have to think through more 2023-01-01 01:29:48 +00:00
Andrej Karpathy
7c6ea8409e simplify the prepare script a lot, write only using one process, seems sufficient for now. ty @LaihoE for suggestion and @proger for flagging 2022-12-30 22:18:20 +00:00
Andrej Karpathy
d8abd21258 typo fix in readme 2022-12-30 00:07:58 +00:00
Andrej Karpathy
5a725d9098 add torch.compile by default, shows almost 1.8X improvement in throughput nice 2022-12-30 00:07:13 +00:00
Andrej
fb52554ca8
Merge pull request #1 from ankandrew/master
Minor Frozen GPTConfig
2022-12-29 13:45:20 -08:00
ankandrew
7f0e6d9a71 Frozen GPTConfig 2022-12-29 17:07:19 -03:00