Komarov Ivan, Dashti Ali, D'Souza Roshan M
Department of Mechanical Engineering, Complex Systems Simulation Lab, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America.
PLoS One. 2014 May 8;9(5):e92409. doi: 10.1371/journal.pone.0092409. eCollection 2014.
In this paper, we describe a new brute force algorithm for building the k-Nearest Neighbor Graph (k-NNG). The k-NNG algorithm has many applications in areas such as machine learning, bio-informatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best algorithm. There are two main parts to the algorithm: the first part is finding the distances between the input vectors, which may be formulated as a matrix multiplication problem; the second is the selection of the k-NNs for each of the query vectors. For the second part, we describe a novel graphics processing unit (GPU)-based multi-select algorithm based on quick sort. Our optimization makes clever use of warp voting functions available on the latest GPUs along with user-controlled cache. Benchmarks show significant improvement over state-of-the-art implementations of the k-NN search on GPUs.
在本文中,我们描述了一种用于构建k近邻图(k-NNG)的新的暴力算法。k-NNG算法在机器学习、生物信息学和聚类分析等领域有许多应用。虽然对于低维数据有非常高效的算法,但对于高维数据,暴力搜索是最佳算法。该算法有两个主要部分:第一部分是找到输入向量之间的距离,这可以表述为一个矩阵乘法问题;第二部分是为每个查询向量选择k个最近邻。对于第二部分,我们描述了一种基于快速排序的新颖的基于图形处理单元(GPU)的多选择算法。我们的优化巧妙地利用了最新GPU上可用的线程束投票函数以及用户控制的缓存。基准测试表明,与GPU上k近邻搜索的现有最佳实现相比有显著改进。