Department of Statistics, University of California, Los Angeles, CA, 90095-1554, USA.
Department of Human Genetics, University of California, Los Angeles, CA, 90095-7088, USA.
Nat Commun. 2018 Mar 8;9(1):997. doi: 10.1038/s41467-018-03405-7.
The emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at the single-cell resolution. ScRNA-seq data analysis is complicated by excess zero counts, the so-called dropouts due to low amounts of mRNA sequenced within individual cells. We introduce scImpute, a statistical method to accurately and robustly impute the dropouts in scRNA-seq data. scImpute automatically identifies likely dropouts, and only perform imputation on these values without introducing new biases to the rest data. scImpute also detects outlier cells and excludes them from imputation. Evaluation based on both simulated and real human and mouse scRNA-seq data suggests that scImpute is an effective tool to recover transcriptome dynamics masked by dropouts. scImpute is shown to identify likely dropouts, enhance the clustering of cell subpopulations, improve the accuracy of differential expression analysis, and aid the study of gene expression dynamics.
新兴的单细胞 RNA 测序 (scRNA-seq) 技术使人们能够在单细胞分辨率下研究转录组图谱。单细胞 RNA-seq 数据分析受到过量零计数的影响,即由于单个细胞中测序的 mRNA 量低而导致的所谓“dropout”。我们引入了 scImpute,这是一种统计方法,可以准确而稳健地对 scRNA-seq 数据中的 dropout 进行插补。scImpute 自动识别可能的 dropout,并仅对这些值进行插补,而不会对其余数据引入新的偏差。scImpute 还可以检测异常细胞并将其排除在插补之外。基于模拟和真实的人类和小鼠 scRNA-seq 数据的评估表明,scImpute 是一种有效的工具,可以恢复因 dropout 而掩盖的转录组动态。scImpute 被证明可以识别可能的 dropout,增强细胞亚群的聚类,提高差异表达分析的准确性,并有助于研究基因表达动态。