nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-10 20:09:58 +00:00

Author	SHA1	Message	Date
Otavio Good	978d4fe538	Fix for gradient_accumulation_steps training slow	2023-03-25 00:04:45 -07:00
Andrej Karpathy	fce706cbe6	tune the hyperparams a bit, in configs	2023-02-05 19:31:18 +00:00
Andrej Karpathy	7d7ded25ce	a bit better settings... for a single gpu at least. these settings would fry a simple cpu though i think	2023-01-14 03:59:53 +00:00
Andrej Karpathy	e21cbf887f	meant to set always_save_checkpoint to False instead, so we only write when val improves	2023-01-12 05:47:34 +00:00
Andrej Karpathy	d17350a31d	add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.	2023-01-11 05:27:19 +00:00