From 6d183e55b086e2b90fed8e46158f561ac7b3da7e Mon Sep 17 00:00:00 2001
From: osmarks <osmarks@mycorrhiza>
Date: Thu, 28 Nov 2024 20:43:57 +0000
Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98vector=5Findexing=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 vector_indexing.myco | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/vector_indexing.myco b/vector_indexing.myco
index 53009af..88415c5 100644
--- a/vector_indexing.myco
+++ b/vector_indexing.myco
@@ -2,4 +2,6 @@
 
 * graph-based
 * product quantization (lossy compression)
-* inverted lists (split vectors into clusters, search a subset of the clusters)
\ No newline at end of file
+* inverted lists (split vectors into clusters, search a subset of the clusters)
+
+Inverted list/product quantization was historically the most common way to search large vector datasets. However, recall is very bad in some circumstances (most notably when query/dataset vectors are drawn from significantly different distributions: see [[https://arxiv.org/abs/2305.04359]] and [[https://kay21s.github.io/RoarGraph-VLDB2024.pdf]]. The latter explains this phenomenon as resulting from the nearest neighbours being split across many more (and more widely distributed) clusters (cells) than with in-distribution queries.
\ No newline at end of file