College of Life Sciences, Northwest A&F University, Yangling, 712100 Shaanxi, China.
College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae558.
Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL.
单细胞 RNA 测序 (scRNA-seq) 提供了在单细胞水平上对转录组范围内基因表达进行前所未有的深入了解。细胞聚类在 scRNA-seq 数据分析中早已确立,用于识别具有相似表达谱的细胞群。然而,细胞聚类在技术上具有挑战性,因为原始 scRNA-seq 数据存在各种分析问题,包括高维性和缺失值。现有研究已经开发了深度学习模型,如基于图机器学习模型和对比学习的模型,用于使用 scRNA-seq 数据进行细胞聚类,并将细胞聚类的无监督学习总结为一种人类可解释的格式。尽管细胞聚类取得了深远的进展,但我们仍然没有找到一个简单而有效的框架来学习用于稳健聚类的高质量表示。在这项研究中,我们提出了 scSimGCL,这是一种基于图对比学习范例的新框架,用于图神经网络的自监督预训练。该框架促进了生成对细胞聚类至关重要的高质量表示。我们的 scSimGCL 结合了细胞-细胞图结构和对比学习,以提高细胞聚类的性能。在模拟和真实 scRNA-seq 数据集上的广泛实验结果表明了所提出的 scSimGCL 的优越性。此外,聚类分配分析证实了 scSimGCL 的普遍适用性,包括最先进的聚类算法。进一步的消融研究和超参数分析表明了我们的网络架构的有效性,以及在自监督学习设置中决策的稳健性。所提出的 scSimGCL 可以作为开发细胞聚类工具的从业者的强大框架。scSimGCL 的源代码可在 https://github.com/zhangzh1328/scSimGCL 上获得。