Edit ‘meme_search_engine/reddit_dump’

2024-11-28 15:20:36 +00:00 · 2024-11-28 15:20:36 +00:00 · b51dec1865
commit b51dec1865
parent 44662723a5
1 changed files with 1 additions and 1 deletions
--- a/meme_search_engine/reddit_dump.myco
+++ b/meme_search_engine/reddit_dump.myco
@ -1 +1 @@
-osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, so 0.8TB of storage is still required (estimated), as well as a month of compute time. This is currently in progress.
+osmarks.net research teams designed [[https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs|code]] to download and embed all images ever posted (and not deleted) from Reddit (excluding NSFW, ads, etc), using streaming processing to avoid having to persist intractable amounts of data to disk. Unfortunately, it is still necessary to store embeddings, so 0.8TB of storage is still required (estimated), as well as a month of compute time. This is currently in progress. Due to the unanticipated complexity of high-performance high-recall [[vector indexing]] on osmarks.net compute budgets, the project has required more development timeslices than predicted.