Chen Jie, Sun Qiucheng, Wang Chunyan, Gao Changbo
School of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China.
Comput Struct Biotechnol J. 2025 Mar 14;27:1090-1102. doi: 10.1016/j.csbj.2025.03.018. eCollection 2025.
Single-cell RNA sequencing (scRNA-seq) enables the analysis of the genome, transcriptome, and epigenome at the single-cell level, providing a critical tool for understanding cellular heterogeneity and diversity. Cell clustering, a key step in scRNA-seq data analysis, reveals population structure by grouping cells with similar expression patterns. However, due to the high dimensionality and sparsity of scRNA-seq data, the performance of existing clustering algorithms remains suboptimal. In this study, we propose a novel clustering algorithm, scCCTR, which performs semi-supervised classification by guiding a deep learning model through iterative selection of high-confidence cells and labels. The algorithm consists of two main components: an iterative selection module and a semi-supervised classification module. In the iterative selection module, scCCTR progressively selects high-confidence cells that exhibit core group features and iteratively optimizes feature representations, constructing a consensus clustering result throughout the iterations. In the semi-supervised classification module, scCCTR uses the selected core data to train a Transformer neural network, which leverages a multi-head attention mechanism to focus on critical information, thereby achieving higher clustering precision. We compared scCCTR with several established cell clustering methods on real datasets, and the results demonstrate that scCCTR outperforms existing methods in terms of accuracy and effectiveness for both cell clustering and visualization. (The code of scCCTR is free available for academic https://github.com/chenjiejie387/scCCTR).
单细胞RNA测序(scRNA-seq)能够在单细胞水平上分析基因组、转录组和表观基因组,为理解细胞异质性和多样性提供了关键工具。细胞聚类是scRNA-seq数据分析中的关键步骤,它通过将具有相似表达模式的细胞分组来揭示群体结构。然而,由于scRNA-seq数据的高维度和稀疏性,现有聚类算法的性能仍然不尽人意。在本研究中,我们提出了一种新颖的聚类算法scCCTR,它通过对高置信度细胞和标签的迭代选择来引导深度学习模型,从而进行半监督分类。该算法由两个主要部分组成:一个迭代选择模块和一个半监督分类模块。在迭代选择模块中,scCCTR逐步选择表现出核心群体特征的高置信度细胞,并迭代优化特征表示,在整个迭代过程中构建一个一致性聚类结果。在半监督分类模块中,scCCTR使用所选的核心数据来训练一个Transformer神经网络,该网络利用多头注意力机制聚焦于关键信息,从而实现更高的聚类精度。我们在真实数据集上比较了scCCTR与几种已有的细胞聚类方法,结果表明,scCCTR在细胞聚类和可视化的准确性和有效性方面均优于现有方法。(scCCTR的代码可在https://github.com/chenjiejie387/scCCTR免费获取以供学术使用)