高维数据的可扩展最近邻算法。

Scalable Nearest Neighbor Algorithms for High Dimensional Data.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.

DOI:10.1109/TPAMI.2014.2321376

Abstract

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

摘要

对于许多计算机视觉和机器学习问题来说，大的训练集是取得良好性能的关键。然而，许多计算机视觉和机器学习算法中最耗费计算资源的部分包括查找与表示训练数据的高维向量的最近邻匹配。我们提出了用于近似最近邻匹配的新算法，并对其进行了评估和与以前的算法进行了比较。对于匹配高维特征，我们发现两种算法最为有效：随机 k-d 森林和本文提出的一种新算法，优先级搜索 k-均值树。我们还提出了一种用于匹配二进制特征的新算法，通过搜索多个层次聚类树，并展示了它优于文献中常用的方法。我们表明，最优的最近邻算法及其参数取决于数据集的特征，并描述了一种自动配置过程，用于查找搜索特定数据集的最佳算法。为了扩展到非常大的数据集，否则它们将无法在单个机器的内存中容纳，我们提出了一个分布式最近邻匹配框架，可以与本文中描述的任何算法一起使用。所有这些研究都已作为一个名为快速近似最近邻库（FLANN）的开源库发布，该库已被集成到 OpenCV 中，现在是最流行的最近邻匹配库之一。

相似文献

Scalable Nearest Neighbor Algorithms for High Dimensional Data.

IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.

K-nearest neighbor finding using MaxNearestDist.

IEEE Trans Pattern Anal Mach Intell. 2008 Feb;30(2):243-52. doi: 10.1109/TPAMI.2007.1182.

Fast Localization in Large-Scale Environments Using Supervised Indexing of Binary Features.

IEEE Trans Image Process. 2016 Jan;25(1):343-58. doi: 10.1109/TIP.2015.2500030. Epub 2015 Nov 11.

Image Geo-Localization Based on Multiple Nearest Neighbor Feature Matching Using Generalized Graphs.

IEEE Trans Pattern Anal Mach Intell. 2014 Aug;36(8):1546-58. doi: 10.1109/TPAMI.2014.2299799.

A fast approximate nearest neighbor search algorithm in the Hamming space.

IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2481-8. doi: 10.1109/TPAMI.2012.170.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.

Asymmetric Mapping Quantization for Nearest Neighbor Search.

IEEE Trans Pattern Anal Mach Intell. 2020 Jul;42(7):1783-1790. doi: 10.1109/TPAMI.2019.2925347. Epub 2019 Jun 27.

Fast exact nearest patch matching for patch-based image editing and processing.

IEEE Trans Vis Comput Graph. 2011 Aug;17(8):1122-34. doi: 10.1109/TVCG.2010.226.

NV-tree: an efficient disk-based index for approximate search in very large high-dimensional collections.

IEEE Trans Pattern Anal Mach Intell. 2009 May;31(5):869-83. doi: 10.1109/TPAMI.2008.130.

Scalable large-margin Mahalanobis distance metric learning.

IEEE Trans Neural Netw. 2010 Sep;21(9):1524-30. doi: 10.1109/TNN.2010.2052630. Epub 2010 Aug 12.

引用本文的文献

Cognitively-plausible reinforcement learning in epidemiological agent-based simulations.

Front Epidemiol. 2025 Jul 28;5:1563731. doi: 10.3389/fepid.2025.1563731. eCollection 2025.

Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets.

Sci Rep. 2025 Jul 9;15(1):24788. doi: 10.1038/s41598-025-09856-5.

Spatial-Aware and Load Balancing Distributed Data Partitioning Strategies for Content-Based Multimedia Retrieval.

Res Sq. 2024 Sep 24:rs.3.rs-4973077. doi: 10.21203/rs.3.rs-4973077/v1.

Deep Deterministic Policy Gradient-Based Resource Allocation Considering Network Slicing and Device-to-Device Communication in Mobile Networks.

Sensors (Basel). 2024 Sep 20;24(18):6079. doi: 10.3390/s24186079.

A novel 3D LiDAR deep learning approach for uncrewed vehicle odometry.

PeerJ Comput Sci. 2024 Jul 17;10:e2189. doi: 10.7717/peerj-cs.2189. eCollection 2024.

Adaptive robotic system for the inspection of aerospace slat actuator mount.

Front Robot AI. 2024 Jun 27;11:1423319. doi: 10.3389/frobt.2024.1423319. eCollection 2024.

Fast and exact fixed-radius neighbor search based on sorting.

PeerJ Comput Sci. 2024 Mar 29;10:e1929. doi: 10.7717/peerj-cs.1929. eCollection 2024.

Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare.

Pac Symp Biocomput. 2024;29:81-95.

Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library.

Nat Commun. 2023 Jun 22;14(1):3722. doi: 10.1038/s41467-023-39279-7.

Online Multivariate Anomaly Detection and Localization for High-Dimensional Settings.

Sensors (Basel). 2022 Oct 28;22(21):8264. doi: 10.3390/s22218264.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高维数据的可扩展最近邻算法。

Scalable Nearest Neighbor Algorithms for High Dimensional Data.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献