Edit ‘meme_search_engine/reddit_dump’

This commit is contained in:
osmarks 2024-09-23 09:37:56 +00:00 committed by wikimind
parent 8036d29953
commit f299fba2a1

View File

@ -1 +1 @@
osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, and the high cost of flash and electricity means a full run (it was tested at 0.5% sampling) has been postponed indefinitely. osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, and the high cost of flash and electricity means a full run (it was tested at 0.5% sampling) has been postponed until local winter.