Alexander Pivovarov
|
eb33b8bf1c
|
Use bf16 only if supported
|
2023-05-17 03:26:48 +00:00 |
|
Andrej Karpathy
|
ab0718a7dd
|
add the estimation of model flops utilization (MFU), a very commonly looked at metric that estimates the token throughput in units of A100 bfloat16 peak flops (312 TFLOPS). this gives us a sense of the hardware utilization we're achieving
|
2023-02-05 00:48:58 +00:00 |
|
Andrej Karpathy
|
580902617c
|
oops optimizer now demands to know device_type
|
2023-02-05 00:43:15 +00:00 |
|
Andrej Karpathy
|
77e7e04c26
|
padding 50257 -> 50304 vocab_size, the nerest multiple of 64. the biggest deal smallest optimization i've made in recent past, about 25% faster. this is because the last layer is a major latency bottleneck consuming about 40% of latency due to the very high channel count.
|
2023-02-04 16:06:18 +00:00 |
|
Andrej Karpathy
|
3fd4c0c5ef
|
who needs a dataloader? overlap the prefetching of the next batch with GPU compute, ehiding the data loading latency entirely. this saves about 1ms lol
|
2023-02-04 02:52:48 +00:00 |
|
Andrej Karpathy
|
d01863ef01
|
small usability tweaks to bench
|
2023-02-02 17:23:46 +00:00 |
|
Andrej Karpathy
|
e808a67149
|
bunch of plumbing of bias all around. measuring bias=False to be about 6% faster
|
2023-01-27 20:41:17 +00:00 |
|
Andrej Karpathy
|
b77c2e86d3
|
copy pasting what seems to work to bench,sample as well. ty @lantiga
|
2023-01-08 19:32:13 +00:00 |
|
Andrej Karpathy
|
d562b3e550
|
shuttling the poor mans configurator aside into its own file and adding it to all of train,sample,bench. because i am leaving args in globals() so i can avoid having to prepend every single variable with an args., i have to exec the configurator and the optional configs. so we're left with something very gross by standard convention but also quite simple and functional. *ducks*
|
2023-01-05 00:44:35 +00:00 |
|
Andrej Karpathy
|
41184a27f5
|
rename compile_model to compile, shroter, version 2 stragglers
|
2023-01-02 01:15:55 +00:00 |
|
Andrej Karpathy
|
5a725d9098
|
add torch.compile by default, shows almost 1.8X improvement in throughput nice
|
2022-12-30 00:07:13 +00:00 |
|
Andrej Karpathy
|
fa57d464d7
|
pull out dtype up top
|
2022-12-29 05:32:55 +00:00 |
|
Andrej Karpathy
|
3000cf5dda
|
add pytorch profiler support. not sure how to support both profiler and simple benchmarking, a bit gnarly atm hmm
|
2022-12-29 01:49:53 +00:00 |
|
Andrej Karpathy
|
b760ef1358
|
add data loading into benchmarking as well, just for completeness
|
2022-12-29 00:05:32 +00:00 |
|
Andrej Karpathy
|
70b5d93aee
|
add benchmarking script v0
|
2022-12-28 23:55:43 +00:00 |
|