Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198, Japan.
Japan Science and Technology Agency, PRESTO, 5-3, Yonbancho, Chiyoda-ku, Tokyo, 102-8666, Japan.
Genome Biol. 2020 Jan 20;21(1):9. doi: 10.1186/s13059-019-1900-3.
Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory.
In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.
We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.
主成分分析(PCA)是分析单细胞 RNA 测序(scRNA-seq)数据集的一种基本方法,但对于大规模 scRNA-seq 数据集,计算时间长且消耗大量内存。
在这项工作中,我们回顾了现有的快速且节省内存的 PCA 算法和实现,并评估了它们在大规模 scRNA-seq 数据集上的实际应用。我们的基准测试表明,一些基于 Krylov 子空间和随机奇异值分解的 PCA 算法速度快、节省内存且比其他算法更准确。
我们根据用户和开发人员计算环境的差异,制定了一个选择合适 PCA 实现的指南。