Suppr超能文献

随机近邻算法。

Randomized approximate nearest neighbors algorithm.

机构信息

Yale University, A. K. Watson Hall, 51 Prospect Street, New Haven, CT 06511, USA.

出版信息

Proc Natl Acad Sci U S A. 2011 Sep 20;108(38):15679-86. doi: 10.1073/pnas.1107769108. Epub 2011 Sep 1.

Abstract

We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {x(j)} in R(d), the algorithm attempts to find k nearest neighbors for each of x(j), where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N·(d + k). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x(j)} for an arbitrary point x ∈ R(d). The cost of each such query is proportional to T·(d·(log d) + log(N/k)·k·(d + log k)), and the memory requirements for the requisite data structure are of the order N·(d + k) + T·(d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x(j)} and illustrate its performance via several numerical examples.

摘要

我们提出了一种用于欧几里得空间中 d 维近似最近邻问题的随机算法。给定 R(d)中的 N 个点 {x(j)},该算法试图为每个 x(j)找到 k 个最近邻,其中 k 是用户指定的整数参数。该算法是迭代的,其运行时间要求与 T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k)成正比,其中 T 是执行的迭代次数。该过程的内存要求为 O(N·(d + k))。该方案的一个副产品是一种数据结构,允许在 R(d)中的任意点 x 中快速搜索 {x(j)}中的 k 个最近邻。每次这样的查询的成本与 T·(d·(log d) + log(N/k)·k·(d + log k))成正比,并且所需数据结构的内存要求为 O(N·(d + k) + T·(d + N))。该算法利用随机旋转和基本的分治方案,然后进行局部图搜索。我们分析了该方案在某些类型的 {x(j)}分布下的行为,并通过几个数值示例说明了其性能。

相似文献

1
Randomized approximate nearest neighbors algorithm.随机近邻算法。
Proc Natl Acad Sci U S A. 2011 Sep 20;108(38):15679-86. doi: 10.1073/pnas.1107769108. Epub 2011 Sep 1.
3
Scalable Nearest Neighbor Algorithms for High Dimensional Data.高维数据的可扩展最近邻算法。
IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.
5
Fast construction of k-nearest neighbor graphs for point clouds.点云的 k-最近邻图的快速构建。
IEEE Trans Vis Comput Graph. 2010 Jul-Aug;16(4):599-608. doi: 10.1109/TVCG.2010.9.
7
K-nearest neighbor finding using MaxNearestDist.使用最大最近距离进行K近邻查找。
IEEE Trans Pattern Anal Mach Intell. 2008 Feb;30(2):243-52. doi: 10.1109/TPAMI.2007.1182.
10
Reducing the time requirement of k-means algorithm.降低 K-Means 算法的时间要求。
PLoS One. 2012;7(12):e49946. doi: 10.1371/journal.pone.0049946. Epub 2012 Dec 11.

引用本文的文献

2
Signal enhancement for two-dimensional cryo-EM data processing.二维冷冻电镜数据处理中的信号增强
Biol Imaging. 2023 Mar 9;3:e7. doi: 10.1017/S2633903X23000065. eCollection 2023.
6
Fast k-NNG construction with GPU-based quick multi-select.基于GPU的快速多选择实现快速k近邻图构建
PLoS One. 2014 May 8;9(5):e92409. doi: 10.1371/journal.pone.0092409. eCollection 2014.
8
Making sense of big data.理解大数据。
Proc Natl Acad Sci U S A. 2013 Nov 5;110(45):18031-2. doi: 10.1073/pnas.1317797110. Epub 2013 Oct 21.
10
A random-permutations-based approach to fast read alignment.基于随机排列的快速读取对齐方法。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-14-S5-S8. Epub 2013 Apr 10.

本文引用的文献

1
A fast randomized algorithm for overdetermined linear least-squares regression.一种用于超定线性最小二乘回归的快速随机算法。
Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13212-7. doi: 10.1073/pnas.0804869105. Epub 2008 Sep 8.
2
Randomized algorithms for the low-rank approximation of matrices.矩阵低秩逼近的随机算法。
Proc Natl Acad Sci U S A. 2007 Dec 18;104(51):20167-72. doi: 10.1073/pnas.0709640104. Epub 2007 Dec 4.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验