随机近邻算法。

Randomized approximate nearest neighbors algorithm.

机构信息

Yale University, A. K. Watson Hall, 51 Prospect Street, New Haven, CT 06511, USA.

出版信息

Proc Natl Acad Sci U S A. 2011 Sep 20;108(38):15679-86. doi: 10.1073/pnas.1107769108. Epub 2011 Sep 1.

DOI:10.1073/pnas.1107769108

PMID:21885738

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3179075/

Abstract

We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {x(j)} in R(d), the algorithm attempts to find k nearest neighbors for each of x(j), where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N·(d + k). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x(j)} for an arbitrary point x ∈ R(d). The cost of each such query is proportional to T·(d·(log d) + log(N/k)·k·(d + log k)), and the memory requirements for the requisite data structure are of the order N·(d + k) + T·(d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x(j)} and illustrate its performance via several numerical examples.

摘要

我们提出了一种用于欧几里得空间中 d 维近似最近邻问题的随机算法。给定 R(d)中的 N 个点 {x(j)}，该算法试图为每个 x(j)找到 k 个最近邻，其中 k 是用户指定的整数参数。该算法是迭代的，其运行时间要求与 T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k)成正比，其中 T 是执行的迭代次数。该过程的内存要求为 O(N·(d + k))。该方案的一个副产品是一种数据结构，允许在 R(d)中的任意点 x 中快速搜索 {x(j)}中的 k 个最近邻。每次这样的查询的成本与 T·(d·(log d) + log(N/k)·k·(d + log k))成正比，并且所需数据结构的内存要求为 O(N·(d + k) + T·(d + N))。该算法利用随机旋转和基本的分治方案，然后进行局部图搜索。我们分析了该方案在某些类型的 {x(j)}分布下的行为，并通过几个数值示例说明了其性能。

相似文献

Randomized approximate nearest neighbors algorithm.随机近邻算法。

Proc Natl Acad Sci U S A. 2011 Sep 20;108(38):15679-86. doi: 10.1073/pnas.1107769108. Epub 2011 Sep 1.

Randomized near-neighbor graphs, giant components and applications in data science.随机近邻图、巨分量及其在数据科学中的应用。

J Appl Probab. 2020 Jun;57(2):458-476. doi: 10.1017/jpr.2020.21. Epub 2020 Jul 16.

Scalable Nearest Neighbor Algorithms for High Dimensional Data.高维数据的可扩展最近邻算法。

IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.

A gamma dose distribution evaluation technique using the k-d tree for nearest neighbor searching.使用 k-d 树进行最近邻搜索的伽马剂量分布评估技术。

Med Phys. 2010 Sep;37(9):4868-73. doi: 10.1118/1.3480964.

Fast construction of k-nearest neighbor graphs for point clouds.点云的 k-最近邻图的快速构建。

IEEE Trans Vis Comput Graph. 2010 Jul-Aug;16(4):599-608. doi: 10.1109/TVCG.2010.9.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。

Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.

K-nearest neighbor finding using MaxNearestDist.使用最大最近距离进行K近邻查找。

IEEE Trans Pattern Anal Mach Intell. 2008 Feb;30(2):243-52. doi: 10.1109/TPAMI.2007.1182.

Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method.使用自适应模糊k近邻法准确预测酶亚家族类别。

Biosystems. 2007 Sep-Oct;90(2):405-13. doi: 10.1016/j.biosystems.2006.10.004. Epub 2006 Oct 26.

A portable clustering algorithm based on compact neighbors for face tagging.基于紧凑邻居的便携式人脸标记聚类算法。

Neural Netw. 2022 Oct;154:508-520. doi: 10.1016/j.neunet.2022.07.025. Epub 2022 Jul 27.

Reducing the time requirement of k-means algorithm.降低 K-Means 算法的时间要求。

PLoS One. 2012;7(12):e49946. doi: 10.1371/journal.pone.0049946. Epub 2012 Dec 11.

引用本文的文献

Predictive modeling of biomedical temporal data in healthcare applications: review and future directions.医疗保健应用中生物医学时间数据的预测建模：综述与未来方向

Front Physiol. 2024 Oct 15;15:1386760. doi: 10.3389/fphys.2024.1386760. eCollection 2024.

Signal enhancement for two-dimensional cryo-EM data processing.二维冷冻电镜数据处理中的信号增强

Biol Imaging. 2023 Mar 9;3:e7. doi: 10.1017/S2633903X23000065. eCollection 2023.

GraFT: Graph Filtered Temporal Dictionary Learning for Functional Neural Imaging.GraFT：用于功能神经成像的图过滤时频字典学习。

IEEE Trans Image Process. 2022;31:3509-3524. doi: 10.1109/TIP.2022.3171414. Epub 2022 May 18.

Randomized near-neighbor graphs, giant components and applications in data science.随机近邻图、巨分量及其在数据科学中的应用。

J Appl Probab. 2020 Jun;57(2):458-476. doi: 10.1017/jpr.2020.21. Epub 2020 Jul 16.

A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen.一种用于序列数据的最近邻网络模型揭示了病原体基因型分布的新见解。

BMC Bioinformatics. 2018 Dec 12;19(1):475. doi: 10.1186/s12859-018-2453-2.

Fast k-NNG construction with GPU-based quick multi-select.基于GPU的快速多选择实现快速k近邻图构建

PLoS One. 2014 May 8;9(5):e92409. doi: 10.1371/journal.pone.0092409. eCollection 2014.

Rotationally invariant image representation for viewing direction classification in cryo-EM.用于冷冻电镜中观察方向分类的旋转不变图像表示

J Struct Biol. 2014 Apr;186(1):153-66. doi: 10.1016/j.jsb.2014.03.003. Epub 2014 Mar 12.

Making sense of big data.理解大数据。

Proc Natl Acad Sci U S A. 2013 Nov 5;110(45):18031-2. doi: 10.1073/pnas.1317797110. Epub 2013 Oct 21.

Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.GPU 集群上大规模高维数据集的 k-最近邻图的高效计算。

PLoS One. 2013 Sep 23;8(9):e74113. doi: 10.1371/journal.pone.0074113. eCollection 2013.

A random-permutations-based approach to fast read alignment.基于随机排列的快速读取对齐方法。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-14-S5-S8. Epub 2013 Apr 10.

本文引用的文献

A fast randomized algorithm for overdetermined linear least-squares regression.一种用于超定线性最小二乘回归的快速随机算法。

Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13212-7. doi: 10.1073/pnas.0804869105. Epub 2008 Sep 8.

Randomized algorithms for the low-rank approximation of matrices.矩阵低秩逼近的随机算法。

Proc Natl Acad Sci U S A. 2007 Dec 18;104(51):20167-72. doi: 10.1073/pnas.0709640104. Epub 2007 Dec 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验