Edit ‘vector_indexing’
This commit is contained in:
parent
6d183e55b0
commit
7e3e3cd754
@ -4,4 +4,4 @@
|
||||
* product quantization (lossy compression)
|
||||
* inverted lists (split vectors into clusters, search a subset of the clusters)
|
||||
|
||||
Inverted list/product quantization was historically the most common way to search large vector datasets. However, recall is very bad in some circumstances (most notably when query/dataset vectors are drawn from significantly different distributions: see [[https://arxiv.org/abs/2305.04359]] and [[https://kay21s.github.io/RoarGraph-VLDB2024.pdf]]. The latter explains this phenomenon as resulting from the nearest neighbours being split across many more (and more widely distributed) clusters (cells) than with in-distribution queries.
|
||||
IVF-DAC (for some reason), which is just inverted lists combined with product quantization, was historically the most common way to search large vector datasets. However, recall is very bad in some circumstances (most notably when query/dataset vectors are drawn from significantly different distributions: see [[https://arxiv.org/abs/2305.04359]] and [[https://kay21s.github.io/RoarGraph-VLDB2024.pdf]]. The latter explains this phenomenon as resulting from the nearest neighbours being split across many more (and more widely distributed) clusters (cells) than with in-distribution queries.
|
Loading…
Reference in New Issue
Block a user