Li Huamin, Kluger Yuval, Tygert Mark
Yale University, Program in Applied Mathematics, 51 Prospect St., New Haven, CT 06510.
Yale University, School of Medicine, Department of Pathology, Suite 505L, 300 George St., New Haven, CT 06520.
Adv Comput Math. 2018 Oct;44(5):1651-1672. doi: 10.1007/s10444-018-9600-1. Epub 2018 Mar 19.
Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
(1)对高度矩形矩阵进行主成分分析或奇异值分解的分布式计算,以及(2)对任意矩阵进行低秩近似(以奇异值分解的形式)的分布式计算。经过精心优化的算法所产生的结果始终优于Spark(分布式计算的流行平台)中现有的确定性实现;特别是,现有的软件会在没有警告的情况下返回数值上远非正交的左奇异向量,而经过显著优化的随机实现所生成的左奇异向量在数值上几乎达到机器精度的正交。