Department of Probability and Statistics, School of Mathematical Sciences, Peking University, Beijing 100871, China.
Damo Academy, Alibaba Group, Beijing 100029, China.
Bioinformatics. 2020 May 1;36(10):3156-3161. doi: 10.1093/bioinformatics/btaa139.
Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis.
In this article, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis.
The R package scRMD is available at https://github.com/XiDsLab/scRMD.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序 (scRNA-seq) 技术能够在单细胞分辨率下进行全转录组谱分析,在许多生物和医学应用中具有很大的应用前景。然而,scRNA-seq 经常无法捕获表达基因,导致明显的缺失问题。这些缺失在下游分析中会导致许多问题,例如噪声显著增加、差异表达分析的功效损失以及基因间或细胞间关系的模糊。在 scRNA-seq 数据分析中,对这些缺失值进行插补可能是有益的。
在本文中,我们将缺失值插补问题建模为稳健的矩阵分解。该模型的假设最少,允许我们开发一种称为 scRMD 的计算高效的插补方法。广泛的数据分析表明,scRMD 可以准确地恢复缺失值,并有助于改善下游分析,如差异表达分析和聚类分析。
R 包 scRMD 可在 https://github.com/XiDsLab/scRMD 上获得。
补充数据可在生物信息学在线获得。