INRIA Rennes, Campus de Beaulieu, 263 Avenue du Général Leclerc, 35042 Rennes, France.
IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):117-28. doi: 10.1109/TPAMI.2010.57.
This paper introduces a product quantization-based approach for approximate nearest neighbor search. The idea is to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace quantization indices. The euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code. Experimental results show that our approach searches for nearest neighbors efficiently, in particular in combination with an inverted file system. Results for SIFT and GIST image descriptors show excellent search accuracy, outperforming three state-of-the-art approaches. The scalability of our approach is validated on a data set of two billion vectors.
本文提出了一种基于乘积量化的近似最近邻搜索方法。其思想是将空间分解为低维子空间的笛卡尔积,并分别对每个子空间进行量化。向量由其子空间量化索引组成的短码表示。通过代码可以有效地估计两个向量之间的欧几里得距离。不对称版本提高了精度,因为它计算了向量和代码之间的近似距离。实验结果表明,我们的方法能够有效地搜索最近邻,特别是与倒排文件系统结合使用时。SIFT 和 GIST 图像描述符的实验结果表明,搜索精度非常高,优于三种最先进的方法。我们的方法在一个包含 20 亿个向量的数据集上的可扩展性得到了验证。