School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, People's Republic of China.
Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
Genome Biol. 2019 Dec 10;20(1):269. doi: 10.1186/s13059-019-1898-6.
Dimensionality reduction is an indispensable analytic component for many areas of single-cell RNA sequencing (scRNA-seq) data analysis. Proper dimensionality reduction can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of dimensionality reduction in scRNA-seq analysis and the vast number of dimensionality reduction methods developed for scRNA-seq studies, few comprehensive comparison studies have been performed to evaluate the effectiveness of different dimensionality reduction methods in scRNA-seq.
We aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used dimensionality reduction methods for scRNA-seq studies. Specifically, we compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample sizes. We evaluate the performance of different dimensionality reduction methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluate the computational scalability of different dimensionality reduction methods by recording their computational cost.
Based on the comprehensive evaluation results, we provide important guidelines for choosing dimensionality reduction methods for scRNA-seq data analysis. We also provide all analysis scripts used in the present study at www.xzlab.org/reproduce.html.
降维是单细胞 RNA 测序(scRNA-seq)数据分析中许多领域不可或缺的分析组成部分。适当的降维可以有效地去除噪声,并促进许多下游分析,包括细胞聚类和谱系重建。不幸的是,尽管降维在 scRNA-seq 分析中至关重要,并且已经开发了大量用于 scRNA-seq 研究的降维方法,但很少有全面的比较研究来评估不同降维方法在 scRNA-seq 中的有效性。
我们旨在通过对各种常用于 scRNA-seq 研究的降维方法进行比较评估来填补这一关键的知识空白。具体来说,我们在涵盖各种测序技术和样本大小的 30 个公共 scRNA-seq 数据集上比较了 18 种不同的降维方法。我们根据降维方法恢复原始表达矩阵特征的能力,以及在细胞聚类和谱系重建方面的准确性和稳健性,评估了不同降维方法在保持邻域方面的性能。我们还通过记录它们的计算成本来评估不同降维方法的计算可扩展性。
根据综合评估结果,我们为 scRNA-seq 数据分析选择降维方法提供了重要指南。我们还在 www.xzlab.org/reproduce.html 上提供了本研究中使用的所有分析脚本。