School of Artificial Intelligence, Jilin University, Jilin, China.
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada.
Nat Commun. 2023 Jan 25;14(1):400. doi: 10.1038/s41467-023-36134-7.
Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.
单细胞 RNA 测序提供高通量基因表达信息,以探索个体细胞水平的细胞异质性。在描述高通量基因表达数据时,主要面临与维度和辍学事件有关的挑战。为了解决这些问题,我们开发了一种用于单细胞数据分析的深度图学习方法 scMGCA。scMGCA 基于图嵌入自动编码器,同时学习细胞-细胞拓扑表示和聚类分配。我们表明,scMGCA 对于细胞分离和批次效应校正既准确又有效,在多个平台上均优于其他最先进的模型。此外,我们在图嵌入自动编码器的关键压缩转录组空间上进行基因组解释,以展示潜在的基因调控机制。我们证明,在胰腺导管腺癌数据集上,scMGCA 成功地对特定细胞类型进行注释,并揭示了多个与肿瘤相关和细胞信号通路的差异基因表达水平。