School of Mathematics, Harbin Institute of Technology, Harbin, P.R, China.
PLoS Comput Biol. 2021 Jun 17;17(6):e1009118. doi: 10.1371/journal.pcbi.1009118. eCollection 2021 Jun.
The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.
单细胞 RNA 测序 (scRNA-seq) 技术可在单细胞分辨率下获取基因表达信息,为探索细胞异质性和细胞类型提供了一种工具。由于每个细胞中提取的 mRNA 拷贝数量较少,scRNA-seq 数据中存在大量的缺失值,这阻碍了 scRNA-seq 数据的下游分析。我们提出了一种统计方法 SDImpute(单细胞 RNA-seq 缺失值插补),用于对 scRNA-seq 数据中的缺失事件进行块插补。SDImpute 基于基因表达水平以及相似细胞和相似基因之间的基因表达变化,自动识别缺失事件,并利用不受缺失影响的相似细胞中的基因表达来对缺失值进行块插补。在实验中,模拟数据集和真实数据集的结果表明,SDImpute 是一种有效的数据恢复工具,可以保留细胞间基因表达的异质性。与最先进的插补方法相比,SDImpute 提高了下游分析的准确性,包括聚类、可视化和差异表达分析。