Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.
MOE Key Laboratory of Molecular Cardiovascular Sciences, Peking University, Beijing, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
Single-cell RNA-sequencing (scRNA-seq) has been widely used to depict gene expression profiles at the single-cell resolution. However, its relatively high dropout rate often results in artificial zero expressions of genes and therefore compromised reliability of results. To overcome such unwanted sparsity of scRNA-seq data, several imputation algorithms have been developed to recover the single-cell expression profiles. Here, we propose a novel approach, GE-Impute, to impute the dropout zeros in scRNA-seq data with graph embedding-based neural network model. GE-Impute learns the neural graph representation for each cell and reconstructs the cell-cell similarity network accordingly, which enables better imputation of dropout zeros based on the more accurately allocated neighbors in the similarity network. Gene expression correlation analysis between true expression data and simulated dropout data suggests significantly better performance of GE-Impute on recovering dropout zeros for both droplet- and plated-based scRNA-seq data. GE-Impute also outperforms other imputation methods in identifying differentially expressed genes and improving the unsupervised clustering on datasets from various scRNA-seq techniques. Moreover, GE-Impute enhances the identification of marker genes, facilitating the cell type assignment of clusters. In trajectory analysis, GE-Impute improves time-course scRNA-seq data analysis and reconstructing differentiation trajectory. The above results together demonstrate that GE-Impute could be a useful method to recover the single-cell expression profiles, thus enabling better biological interpretation of scRNA-seq data. GE-Impute is implemented in Python and is freely available at https://github.com/wxbCaterpillar/GE-Impute.
单细胞 RNA 测序 (scRNA-seq) 已被广泛用于描绘单细胞分辨率下的基因表达谱。然而,其相对较高的缺失率通常会导致基因的人为零表达,从而降低结果的可靠性。为了克服 scRNA-seq 数据中这种不必要的稀疏性,已经开发了几种插补算法来恢复单细胞表达谱。在这里,我们提出了一种新的方法 GE-Impute,使用基于图嵌入的神经网络模型来插补 scRNA-seq 数据中的缺失零值。GE-Impute 学习每个细胞的神经图表示,并相应地重建细胞间相似性网络,这使得基于相似性网络中更准确分配的邻居更好地插补缺失零值。真实表达数据与模拟缺失数据之间的基因表达相关性分析表明,GE-Impute 在恢复基于液滴和基于平板的 scRNA-seq 数据中的缺失零值方面具有显著更好的性能。GE-Impute 在识别差异表达基因和改进来自各种 scRNA-seq 技术的数据集的无监督聚类方面也优于其他插补方法。此外,GE-Impute 增强了标记基因的识别,有助于聚类的细胞类型分配。在轨迹分析中,GE-Impute 改善了时间序列 scRNA-seq 数据分析和分化轨迹的重建。上述结果共同表明,GE-Impute 可以成为一种恢复单细胞表达谱的有用方法,从而能够更好地解释 scRNA-seq 数据的生物学意义。GE-Impute 是用 Python 实现的,可以在 https://github.com/wxbCaterpillar/GE-Impute 上免费获取。