nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-12-18 14:10:28 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	8f85b83347	inference time mini-optimization low-hanging fruit ty @jxtps for raising: when we are running inference we can apply lm_head on only the very last token	2023-01-12 06:02:50 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00
Andrej Karpathy	c2a402f7f7	guess the config from globals() and log all of it with wandb	2023-01-11 01:00:22 +00:00
Andrej Karpathy	a855d316fd	add device and dtype support to train.py args	2023-01-08 19:20:38 +00:00
Luca Antiga	09f1f458e8	Move conditional import	2023-01-08 15:51:50 +01:00
Luca Antiga	aba47f0a35	Make wandb import conditioned to wandb_log=True	2023-01-08 15:42:08 +01:00
Andrej Karpathy	9629093e53	minor args re-arranging and removing some spurious ones like wandb entity ty @tcapelle	2023-01-05 01:14:02 +00:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Andrej Karpathy	9f95aca93e	better hyperparams for gpt2 124M model on A100 40GB. still uncertain about max_iters especially, and a bit about weight decay, betas	2023-01-03 17:45:49 +00:00
Andrej Karpathy	ec9b1f8182	add a patch to fix mysterious unwanted prefix in state dict? maybe remove later	2023-01-02 01:25:02 +00:00
Andrej Karpathy	35f51974c4	rename to compile it's shorter	2023-01-02 01:14:46 +00:00
Andrej Karpathy	2febf4463c	candidate changes to apis, have to think through more	2023-01-01 01:29:48 +00:00
Andrej Karpathy	5a725d9098	add torch.compile by default, shows almost 1.8X improvement in throughput nice	2022-12-30 00:07:13 +00:00
Andrej Karpathy	682a0ac8f1	properly resume training, also loading iter_num and best_val_loss from checkpoints	2022-12-29 18:23:15 +00:00
Andrej Karpathy	dea1507252	add support for DDP training. the scaling timings right now do not look good by default, have to dig more into	2022-12-29 05:06:07 +00:00
Andrej Karpathy	5d2b4807bf	adding a lightweight configurator that may be a terrible mistake lol. also adding configs to evaluate the baseline GPT2 versions released by OpenAI on OWT. we have some ways to go to match those numbers atm	2022-12-28 23:31:23 +00:00
Andrej Karpathy	c9fe00c0e9	small readme clarification and training script defaults changes	2022-12-28 01:45:55 +00:00
Andrej Karpathy	fe8042867c	first very bad commit	2022-12-28 00:58:19 +00:00

18 Commits