From f299fba2a138a229f8054cdbcc511c5c97e47cf6 Mon Sep 17 00:00:00 2001 From: osmarks Date: Mon, 23 Sep 2024 09:37:56 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98meme=5Fsearch=5Fengine/reddit?= =?UTF-8?q?=5Fdump=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- meme_search_engine/reddit_dump.myco | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meme_search_engine/reddit_dump.myco b/meme_search_engine/reddit_dump.myco index 1cc8fc0..f642bde 100644 --- a/meme_search_engine/reddit_dump.myco +++ b/meme_search_engine/reddit_dump.myco @@ -1 +1 @@ -osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, and the high cost of flash and electricity means a full run (it was tested at 0.5% sampling) has been postponed indefinitely. \ No newline at end of file +osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, and the high cost of flash and electricity means a full run (it was tested at 0.5% sampling) has been postponed until local winter. \ No newline at end of file