School of Computer Science and Technology, Donghua University 201600, Shanghai, China.
School of Computer Science and Technology, Shanghai University 200444, Shanghai, China.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac018.
Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.
单细胞 RNA 测序 (scRNA-seq) 允许研究人员研究细胞异质性和多样性的复杂机制。无监督聚类对于 scRNA-seq 数据的分析至关重要,因为它可用于识别潜在的细胞类型。然而,由于噪声影响、高维性和普遍存在的缺失事件,scRNA-seq 数据的聚类分析仍然是一个计算挑战。在这里,我们提出了一种新的 scRNA-seq 数据深度结构聚类方法,称为 scDSC,它将结构信息集成到单细胞深度聚类中。所提出的 scDSC 由基于零膨胀负二项式 (ZINB) 模型的自动编码器、图神经网络 (GNN) 模块和相互监督模块组成。为了从稀疏和零膨胀的 scRNA-seq 数据中学习数据表示,我们在基本自动编码器中添加了一个 ZINB 模型。引入 GNN 模块来捕获细胞之间的结构信息。通过将基于 ZINB 的自动编码器与 GNN 模块相结合,该模型将自动编码器学习的数据表示转移到相应的 GNN 层。此外,我们采用相互监督策略来统一这两种不同的深度神经网络架构,并指导聚类任务。在六个真实的 scRNA-seq 数据集上的广泛实验结果表明,scDSC 在聚类准确性和可扩展性方面优于最先进的方法。我们的方法 scDSC 是使用 Pytorch 机器学习库在 Python 中实现的,可以在 https://github.com/DHUDBlab/scDSC 上免费获得。