Dai Qiguo, Liu Wuhao, Yu Xianhai, Duan Xiaodong, Liu Ziqiang
School of Computer Science and Engineering, Dalian Minzu University, Dalian, 116650, China.
SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, Dalian, 116650, China.
Interdiscip Sci. 2025 Apr 3. doi: 10.1007/s12539-025-00700-y.
Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.
在单细胞RNA测序数据中准确识别细胞类型对于理解下游分析中的细胞分化和病理机制至关重要。由于传统生物学方法费力且耗时,因此开发用于细胞分类的计算生物学方法势在必行。然而,现有方法要充分利用大量未标记细胞数据中的潜在基因表达信息仍然是一项挑战,这限制了它们的分类和泛化性能。因此,我们提出了一种用于单细胞分类的新型自监督图表示学习框架,名为scSSGC。具体而言,在自监督学习的预训练阶段,联合对未标记细胞数据进行多个K均值聚类任务用于模型训练,从而缓解标记数据有限的问题。为了有效捕捉细胞之间的潜在相互作用,我们引入了局部增强图神经网络来增强细胞图中邻居较少的节点的信息聚合能力。一系列基准实验表明,scSSGC优于现有的最先进细胞分类方法。更重要的是,scSSGC在面对跨数据集时提供稳定的性能,表明其具有更好的泛化能力。