INRIA, Campus de Beaulieu, 35042 Rennes, France.
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1704-16. doi: 10.1109/TPAMI.2011.235.
This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.
本文解决了大规模图像搜索问题。需要考虑三个约束条件:搜索精度、效率和内存使用。我们首先提出并评估了将局部图像描述符聚合到向量中的不同方法,并表明在任何给定的向量维度下,Fisher 核的性能都优于参考的视觉词汇袋方法。然后,我们联合优化降维和索引,以获得精确的向量比较和紧凑的表示。评估结果表明,在保持高精度的同时,图像表示可以减少到几十个字节。在一个处理器核上搜索一个包含 1 亿张图像的数据集大约需要 250 毫秒。