From 3cf0bc334c8a8b8095e9d75e35abfaaab923d7c7 Mon Sep 17 00:00:00 2001 From: osmarks Date: Fri, 24 Jan 2025 12:23:29 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98meme=5Fsearch=5Fengine/reddit?= =?UTF-8?q?=5Fdump=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- meme_search_engine/reddit_dump.myco | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/meme_search_engine/reddit_dump.myco b/meme_search_engine/reddit_dump.myco index 282f934..d1ef9b7 100644 --- a/meme_search_engine/reddit_dump.myco +++ b/meme_search_engine/reddit_dump.myco @@ -1 +1,3 @@ -osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, so 0.8TB of storage is still required (estimated), as well as a month of compute time. This is currently in progress. Due to the unanticipated complexity of high-performance high-recall [[vector indexing]] on osmarks.net compute budgets, the project has required more development timeslices than predicted. \ No newline at end of file +osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, so 0.8TB of storage is still required (estimated), as well as a month of compute time. Due to the unanticipated complexity of high-performance high-recall [[vector indexing]] on osmarks.net compute budgets, the project required more development timeslices than predicted, but has been completed. + +=> Nooscope \ No newline at end of file