Edit ‘vector_indexing’

2024-11-28 20:43:57 +00:00 · 2024-11-28 20:43:57 +00:00 · 6d183e55b0
commit 6d183e55b0
parent adfe60d681
1 changed files with 3 additions and 1 deletions
--- a/vector_indexing.myco
+++ b/vector_indexing.myco
@ -2,4 +2,6 @@

 * graph-based
 * product quantization (lossy compression)
-* inverted lists (split vectors into clusters, search a subset of the clusters)
+* inverted lists (split vectors into clusters, search a subset of the clusters)
+
+Inverted list/product quantization was historically the most common way to search large vector datasets. However, recall is very bad in some circumstances (most notably when query/dataset vectors are drawn from significantly different distributions: see [[https://arxiv.org/abs/2305.04359]] and [[https://kay21s.github.io/RoarGraph-VLDB2024.pdf]]. The latter explains this phenomenon as resulting from the nearest neighbours being split across many more (and more widely distributed) clusters (cells) than with in-distribution queries.