Andrecut M
Institute for Biocomplexity and Informatics, University of Calgary, Calgary, Alberta, Canada.
J Comput Biol. 2009 Nov;16(11):1593-9. doi: 10.1089/cmb.2008.0221.
Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).
主成分分析(PCA)是多元数据分析的关键统计技术。对于大型数据集,PCA计算的常用方法基于标准的非线性迭代偏最小二乘法主成分分析(NIPALS-PCA)算法,但遗憾的是该算法存在正交性损失问题,因此其适用性通常仅限于前几个成分的估计。在此,我们提出一种基于格拉姆-施密特正交化的算法(称为GS-PCA),它消除了NIPALS-PCA的这一缺点。此外,我们还讨论了NIPALS-PCA算法和GS-PCA算法的图形处理器(GPU)并行实现。数值结果表明,基于英伟达CUDA基础线性代数子程序库(CUBLAS)的GPU并行优化版本比基于GNU科学库的CBLAS的中央处理器(CPU)优化版本快得多(高达12倍)。