Yale University, A. K. Watson Hall, 51 Prospect Street, New Haven, CT 06511, USA.
Proc Natl Acad Sci U S A. 2011 Sep 20;108(38):15679-86. doi: 10.1073/pnas.1107769108. Epub 2011 Sep 1.
We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {x(j)} in R(d), the algorithm attempts to find k nearest neighbors for each of x(j), where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N·(d + k). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x(j)} for an arbitrary point x ∈ R(d). The cost of each such query is proportional to T·(d·(log d) + log(N/k)·k·(d + log k)), and the memory requirements for the requisite data structure are of the order N·(d + k) + T·(d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x(j)} and illustrate its performance via several numerical examples.
我们提出了一种用于欧几里得空间中 d 维近似最近邻问题的随机算法。给定 R(d)中的 N 个点 {x(j)},该算法试图为每个 x(j)找到 k 个最近邻,其中 k 是用户指定的整数参数。该算法是迭代的,其运行时间要求与 T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k)成正比,其中 T 是执行的迭代次数。该过程的内存要求为 O(N·(d + k))。该方案的一个副产品是一种数据结构,允许在 R(d)中的任意点 x 中快速搜索 {x(j)}中的 k 个最近邻。每次这样的查询的成本与 T·(d·(log d) + log(N/k)·k·(d + log k))成正比,并且所需数据结构的内存要求为 O(N·(d + k) + T·(d + N))。该算法利用随机旋转和基本的分治方案,然后进行局部图搜索。我们分析了该方案在某些类型的 {x(j)}分布下的行为,并通过几个数值示例说明了其性能。