Edit ‘vector_indexing’

This commit is contained in:
osmarks 2024-11-28 20:53:58 +00:00 committed by wikimind
parent b4b44c38b3
commit 87b1cbf769

View File

@ -6,4 +6,4 @@
IVF-DAC (for some reason), which is just inverted lists combined with product quantization, was historically the most common way to search large vector datasets. However, recall is very bad in some circumstances (most notably when query/dataset vectors are drawn from significantly different distributions: see [[https://arxiv.org/abs/2305.04359]] and [[https://kay21s.github.io/RoarGraph-VLDB2024.pdf]]). The latter explains this phenomenon as resulting from the nearest neighbours being split across many more (and more widely distributed) clusters (cells) than with in-distribution queries.
Graph-based approaches aim to create graphs such that a greedy search on the graph toward closer (by the vector distance metric) points rapidly converges
Graph-based approaches aim to create graphs such that a greedy search on the graph toward closer (by the vector distance metric) points rapidly converges on (most of the time) the best-matching point.