1
0
mirror of https://github.com/osmarks/nanogpt-experiments.git synced 2024-11-13 05:19:58 +00:00

nanogpt: fix multiprocessing in load_dataset on os x

The issue seems to be that _fixup_main_from_path in multiprocessing
module in python is unable to find entry point, thus, adding
```
if __name__ == '__main__'
```
This commit is contained in:
Oleksandr Kuvshynov 2023-06-17 20:35:38 -04:00
parent bb7e96754a
commit 542ac51d1f

View File

@ -16,6 +16,7 @@ num_proc = 8
# it is better than 1 usually though
num_proc_load_dataset = num_proc
if __name__ == '__main__':
# takes 54GB in huggingface .cache dir, about 8M documents (8,013,769)
dataset = load_dataset("openwebtext", num_proc=num_proc_load_dataset)