一种具有动态融合的混合对抗自编码器-图网络模型，用于稳健的单细胞RNA测序聚类。

A hybrid adversarial autoencoder-graph network model with dynamic fusion for robust scRNA-seq clustering.

作者信息

Tang Binhua, Feng Yingying, Gao Xinyu

机构信息

Key Laboratory of Maritime Intelligent Cyberspace Technology (Ministry of Education of China), Hohai University, 213200, Nanjing, China.

Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, 200438, China.

出版信息

BMC Genomics. 2025 Aug 18;26(1):749. doi: 10.1186/s12864-025-11941-y.

DOI:10.1186/s12864-025-11941-y

PMID:40826008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12359927/

Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) allows the exploration of biological heterogeneity among different cell types within tissues at a single-cell resolution. Cell clustering serves as a foundation for scRNA-seq data analysis and provides new insights into the heterogeneity of cells within complex tissues. However, the inherent features of scRNA-seq data, such as heterogeneity, sparsity, and high dimensionality, pose significant technical challenges for effective cell clustering.

RESULTS

Here, we present a novel deep clustering method, scCAGN, based on an adversarial autoencoder (AAE) and a cross-attention graph convolutional network (GCN), to address the above challenges in scRNA-seq data analysis. Specifically, to enhance data reconstruction, scCAGN utilizes adversarial autoencoders to augment encoder capabilities. Graph feature representations obtained via a GCN were integrated using a dynamic information fusion mechanism, yielding enhanced feature representations. In addition, scCAGN combines three different loss functions to optimize clustering performance through a joint clustering approach. By leveraging a unique information fusion and joint mechanism, scCAGN extracts deep cell features without labeled information, thus improving cell classification efficiency. Our findings show that scCAGN surpasses the existing methods in clustering performance across eight typical scRNA-seq datasets, achieving a maximum Normalized Mutual Information (NMI) improvement of 11.94%, notably reaching an NMI of 0.9732 in the QS_diaphragm dataset. It showed an average NMI improvement of 13% across the eight benchmark datasets, surpassing the lowest-performing method. Further ablation and hyperparameter analyses validated the robustness of the proposed method. The code is available at: http://github.com/gladex/scCAGN .

CONCLUSION

scCAGN integrates AAE and cross-attention GCN with dynamic fusion, achieving state-of-the-art scRNA-seq clustering (0.9732 NMI, 13% average gain) across eight datasets. Validated via ablation and hyperparameter tests, it advances label-free cell discovery and enables further multimodal integration to dissect cellular heterogeneity.

摘要

背景

单细胞RNA测序（scRNA-seq）能够在单细胞分辨率下探究组织内不同细胞类型之间的生物学异质性。细胞聚类是scRNA-seq数据分析的基础，为深入了解复杂组织中细胞的异质性提供了新视角。然而，scRNA-seq数据的固有特征，如异质性、稀疏性和高维度性，给有效的细胞聚类带来了重大技术挑战。

结果

在此，我们提出了一种基于对抗自编码器（AAE）和交叉注意力图卷积网络（GCN）的新型深度聚类方法scCAGN，以应对scRNA-seq数据分析中的上述挑战。具体而言，为了增强数据重建能力，scCAGN利用对抗自编码器增强编码器功能。通过动态信息融合机制整合经GCN获得的图特征表示，从而产生增强的特征表示。此外，scCAGN结合三种不同的损失函数，通过联合聚类方法优化聚类性能。通过利用独特的信息融合和联合机制，scCAGN在无标记信息的情况下提取深度细胞特征，从而提高细胞分类效率。我们的研究结果表明，scCAGN在八个典型scRNA-seq数据集的聚类性能上超过了现有方法，最大归一化互信息（NMI）提高了11.94%，在QS_diaphragm数据集中显著达到了0.9732的NMI。在八个基准数据集中，其平均NMI提高了13%，超过了性能最差的方法。进一步的消融和超参数分析验证了所提出方法的稳健性。代码可在以下网址获取：http://github.com/gladex/scCAGN 。