基于自动编码器和图神经网络的单细胞 RNA-seq 数据深度结构聚类。

Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network.

机构信息

School of Computer Science and Technology, Donghua University 201600, Shanghai, China.

School of Computer Science and Technology, Shanghai University 200444, Shanghai, China.

出版信息

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac018.

DOI:10.1093/bib/bbac018

PMID:35172334

Abstract

Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.

摘要

单细胞 RNA 测序 (scRNA-seq) 允许研究人员研究细胞异质性和多样性的复杂机制。无监督聚类对于 scRNA-seq 数据的分析至关重要，因为它可用于识别潜在的细胞类型。然而，由于噪声影响、高维性和普遍存在的缺失事件，scRNA-seq 数据的聚类分析仍然是一个计算挑战。在这里，我们提出了一种新的 scRNA-seq 数据深度结构聚类方法，称为 scDSC，它将结构信息集成到单细胞深度聚类中。所提出的 scDSC 由基于零膨胀负二项式 (ZINB) 模型的自动编码器、图神经网络 (GNN) 模块和相互监督模块组成。为了从稀疏和零膨胀的 scRNA-seq 数据中学习数据表示，我们在基本自动编码器中添加了一个 ZINB 模型。引入 GNN 模块来捕获细胞之间的结构信息。通过将基于 ZINB 的自动编码器与 GNN 模块相结合，该模型将自动编码器学习的数据表示转移到相应的 GNN 层。此外，我们采用相互监督策略来统一这两种不同的深度神经网络架构，并指导聚类任务。在六个真实的 scRNA-seq 数据集上的广泛实验结果表明，scDSC 在聚类准确性和可扩展性方面优于最先进的方法。我们的方法 scDSC 是使用 Pytorch 机器学习库在 Python 中实现的，可以在 https://github.com/DHUDBlab/scDSC 上免费获得。