Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey 07102, USA.
Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina 27109, USA.
Genome Res. 2022 Oct;32(10):1906-1917. doi: 10.1101/gr.276477.121. Epub 2022 Oct 5.
Spatially resolved scRNA-seq (sp-scRNA-seq) technologies provide the potential to comprehensively profile gene expression patterns in tissue context. However, the development of computational methods lags behind the advances in these technologies, which limits the fulfillment of their potential. In this study, we develop a deep learning approach for clustering sp-scRNA-seq data, named Deep Spatially constrained Single-cell Clustering (DSSC). In this model, we integrate the spatial information of cells into the clustering process in two steps: (1) the spatial information is encoded by using a graphical neural network model, and (2) cell-to-cell constraints are built based on the spatial expression pattern of the marker genes and added in the model to guide the clustering process. Then, a deep embedding clustering is performed on the bottleneck layer of autoencoder by Kullback-Leibler (KL) divergence along with the learning of feature representation. DSSC is the first model that can use information from both spatial coordinates and marker genes to guide cell/spot clustering. Extensive experiments on both simulated and real data sets show that DSSC boosts clustering performance significantly compared with the state-of-the-art methods. It has robust performance across different data sets with various cell type/tissue organization and/or cell type/tissue spatial dependency. We conclude that DSSC is a promising tool for clustering sp-scRNA-seq data.
空间分辨 scRNA-seq(sp-scRNA-seq)技术具有在组织背景下全面分析基因表达模式的潜力。然而,计算方法的发展落后于这些技术的进步,这限制了它们潜力的发挥。在本研究中,我们开发了一种用于聚类 sp-scRNA-seq 数据的深度学习方法,名为 Deep Spatially constrained Single-cell Clustering(DSSC)。在该模型中,我们分两步将细胞的空间信息纳入聚类过程:(1)使用图神经网络模型对空间信息进行编码,(2)基于标记基因的空间表达模式构建细胞间约束,并将其添加到模型中以指导聚类过程。然后,通过自动编码器的瓶颈层上的 Kullback-Leibler(KL)散度和特征表示的学习,对深度嵌入聚类进行。DSSC 是第一个可以利用空间坐标和标记基因信息来指导细胞/斑点聚类的模型。在模拟和真实数据集上的广泛实验表明,与最先进的方法相比,DSSC 显著提高了聚类性能。它在具有不同细胞类型/组织结构和/或细胞类型/组织空间依赖性的不同数据集上具有稳健的性能。我们得出结论,DSSC 是一种很有前途的用于聚类 sp-scRNA-seq 数据的工具。