College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
Int J Mol Sci. 2024 May 29;25(11):5976. doi: 10.3390/ijms25115976.
Single-cell RNA sequencing (scRNA-seq) is widely used to interpret cellular states, detect cell subpopulations, and study disease mechanisms. In scRNA-seq data analysis, cell clustering is a key step that can identify cell types. However, scRNA-seq data are characterized by high dimensionality and significant sparsity, presenting considerable challenges for clustering. In the high-dimensional gene expression space, cells may form complex topological structures. Many conventional scRNA-seq data analysis methods focus on identifying cell subgroups rather than exploring these potential high-dimensional structures in detail. Although some methods have begun to consider the topological structures within the data, many still overlook the continuity and complex topology present in single-cell data. We propose a deep learning framework that begins by employing a zero-inflated negative binomial (ZINB) model to denoise the highly sparse and over-dispersed scRNA-seq data. Next, scZAG uses an adaptive graph contrastive representation learning approach that combines approximate personalized propagation of neural predictions graph convolution (APPNPGCN) with graph contrastive learning methods. By using APPNPGCN as the encoder for graph contrastive learning, we ensure that each cell's representation reflects not only its own features but also its position in the graph and its relationships with other cells. Graph contrastive learning exploits the relationships between nodes to capture the similarity among cells, better representing the data's underlying continuity and complex topology. Finally, the learned low-dimensional latent representations are clustered using Kullback-Leibler divergence. We validated the superior clustering performance of scZAG on 10 common scRNA-seq datasets in comparison to existing state-of-the-art clustering methods.
单细胞 RNA 测序 (scRNA-seq) 被广泛用于解释细胞状态、检测细胞亚群和研究疾病机制。在 scRNA-seq 数据分析中,细胞聚类是识别细胞类型的关键步骤。然而,scRNA-seq 数据具有高维性和显著的稀疏性,这给聚类带来了相当大的挑战。在高维基因表达空间中,细胞可能形成复杂的拓扑结构。许多传统的 scRNA-seq 数据分析方法侧重于识别细胞亚群,而不是详细探索这些潜在的高维结构。尽管一些方法已经开始考虑数据中的拓扑结构,但许多方法仍然忽略了单细胞数据中的连续性和复杂拓扑结构。我们提出了一个深度学习框架,该框架首先使用零膨胀负二项式 (ZINB) 模型对高度稀疏和过度分散的 scRNA-seq 数据进行去噪。接下来,scZAG 使用自适应图对比表示学习方法,该方法结合了近似个性化传播神经预测图卷积 (APPNPGCN) 和图对比学习方法。通过使用 APPNPGCN 作为图对比学习的编码器,我们确保每个细胞的表示不仅反映了其自身的特征,还反映了其在图中的位置及其与其他细胞的关系。图对比学习利用节点之间的关系来捕获细胞之间的相似性,更好地表示数据的潜在连续性和复杂拓扑结构。最后,使用 Kullback-Leibler 散度对学习到的低维潜在表示进行聚类。我们在 10 个常见的 scRNA-seq 数据集上验证了 scZAG 优于现有最先进的聚类方法的优越聚类性能。