Zhang Zhi, Sun Qiucheng, Wang Chunyan, Jiang Songrun
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf368.
In the last few years, there has been great advancement in the field of single-cell data investigation, particularly in the development of clustering methods. The advanced research is increased for the development of clustering algorithms tailored for single-cell RNA sequencing data. Conventional methods primarily focus on local relationships among cells or genes, while overlooking the global cell-gene interactions. As a result, the high dimensionality, noise, and sparsity of the data continue to pose significant challenges to clustering accuracy. To address the challenges of single-cell clustering analysis, we propose a novel single-cell clustering model, scGGC, which integrates graph autoencoders and generative adversarial network techniques. The innovations of scGGC include two components: (i) construction of an adjacency matrix that incorporates cell-cell and cell-gene relationships to capture complex interactions in a graph structure, enabling nonlinear dimensionality reduction and initial clustering via a graph autoencoder; (ii) enhancement of clustering performance by selecting high-confidence samples from the initial clusters for adversarial neural network training. A comprehensive evaluation on nine publicly available scRNA-seq datasets demonstrates that scGGC outperforms eight comparison methods. For example, on datasets such as MHC3K, the Adjusted Rand Index increases by an average of 10.1%. Furthermore, marker gene identification and cell type annotation further confirm the biological relevance of scGGC, with marker gene overlap rates exceeding 70% across multiple datasets. We conclude that scGGC not only improves the accuracy of single-cell data clustering but also enhances the identification of cell-type-specific marker genes. The scGGC code is available at https://github.com/Zhi1002/scGGC.
在过去几年中,单细胞数据研究领域取得了巨大进展,尤其是在聚类方法的开发方面。针对单细胞RNA测序数据量身定制的聚类算法的先进研究不断增加。传统方法主要关注细胞或基因之间的局部关系,而忽略了全局的细胞-基因相互作用。因此,数据的高维度、噪声和稀疏性继续对聚类准确性构成重大挑战。为了应对单细胞聚类分析的挑战,我们提出了一种新颖的单细胞聚类模型scGGC,它集成了图自动编码器和生成对抗网络技术。scGGC的创新包括两个部分:(i)构建一个邻接矩阵,该矩阵纳入细胞-细胞和细胞-基因关系,以在图结构中捕获复杂的相互作用,通过图自动编码器实现非线性降维和初始聚类;(ii)通过从初始聚类中选择高置信度样本进行对抗神经网络训练来提高聚类性能。对九个公开可用的scRNA-seq数据集的综合评估表明,scGGC优于八种比较方法。例如,在MHC3K等数据集上,调整兰德指数平均提高了10.1%。此外,标记基因识别和细胞类型注释进一步证实了scGGC的生物学相关性,多个数据集上的标记基因重叠率超过70%。我们得出结论,scGGC不仅提高了单细胞数据聚类的准确性,还增强了细胞类型特异性标记基因的识别。scGGC代码可在https://github.com/Zhi1002/scGGC获取。