nanogpt-experiments

mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-09-21 03:39:44 +00:00

Author	SHA1	Message	Date
o	1eaceae193	Fix AssertionError on macOS - need to check CUDA availability for bf16	2023-06-19 18:05:09 -04:00
Alexander Pivovarov	eb33b8bf1c	Use bf16 only if supported	2023-05-17 03:26:48 +00:00
Andrej Karpathy	ab0718a7dd	add the estimation of model flops utilization (MFU), a very commonly looked at metric that estimates the token throughput in units of A100 bfloat16 peak flops (312 TFLOPS). this gives us a sense of the hardware utilization we're achieving	2023-02-05 00:48:58 +00:00
Andrej Karpathy	580902617c	oops optimizer now demands to know device_type	2023-02-05 00:43:15 +00:00
Andrej Karpathy	77e7e04c26	padding 50257 -> 50304 vocab_size, the nerest multiple of 64. the biggest deal smallest optimization i've made in recent past, about 25% faster. this is because the last layer is a major latency bottleneck consuming about 40% of latency due to the very high channel count.	2023-02-04 16:06:18 +00:00
Andrej Karpathy	3fd4c0c5ef	who needs a dataloader? overlap the prefetching of the next batch with GPU compute, ehiding the data loading latency entirely. this saves about 1ms lol	2023-02-04 02:52:48 +00:00
Andrej Karpathy	d01863ef01	small usability tweaks to bench	2023-02-02 17:23:46 +00:00
Andrej Karpathy	e808a67149	bunch of plumbing of bias all around. measuring bias=False to be about 6% faster	2023-01-27 20:41:17 +00:00
Andrej Karpathy	b77c2e86d3	copy pasting what seems to work to bench,sample as well. ty @lantiga	2023-01-08 19:32:13 +00:00
Andrej Karpathy	d562b3e550	shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. ducks	2023-01-05 00:44:35 +00:00
Andrej Karpathy	41184a27f5	rename compile_model to compile, shroter, version 2 stragglers	2023-01-02 01:15:55 +00:00
Andrej Karpathy	5a725d9098	add torch.compile by default, shows almost 1.8X improvement in throughput nice	2022-12-30 00:07:13 +00:00
Andrej Karpathy	fa57d464d7	pull out dtype up top	2022-12-29 05:32:55 +00:00
Andrej Karpathy	3000cf5dda	add pytorch profiler support. not sure how to support both profiler and simple benchmarking, a bit gnarly atm hmm	2022-12-29 01:49:53 +00:00
Andrej Karpathy	b760ef1358	add data loading into benchmarking as well, just for completeness	2022-12-29 00:05:32 +00:00
Andrej Karpathy	70b5d93aee	add benchmarking script v0	2022-12-28 23:55:43 +00:00

16 Commits