Shu Zhenqiu, Ren Yixuan, Long Qinghan, Wang Hongbin, Yu Zhengtao
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
J Chem Inf Model. 2025 Jun 23;65(12):6367-6381. doi: 10.1021/acs.jcim.5c00731. Epub 2025 Jun 5.
Single-cell RNA sequencing (scRNA-seq) has become a crucial technology for analyzing cellular diversity at the single-cell level. Cell clustering is crucial in scRNA-seq data analysis as it accurately identifies distinct cell types and uncovers potential subpopulations. However, most existing scRNA-seq methods rely on a single view for analysis, leading to an incomplete interpretation of the scRNA-seq data. Furthermore, the high dimensionality of the scRNA-seq data and the inevitable noise pose significant challenges for clustering tasks. To address these challenges, in this study, we introduce a novel clustering method, called graph attention network with subspace learning (scGANSL), for scRNA-seq data clustering. Specifically, the proposed scGANSL method first constructs two views using highly variable genes (HVGs) screening and principal component analysis (PCA). They are then individually fed into a multiview shared graph autoencoder, where clustering labels guide the learning of latent representations and the coefficient matrix. Furthermore, the proposed method integrates a zero-inflated negative binomial (ZINB) model into a self-supervised graph attention autoencoder to learn latent representations more effectively. To preserve both local and global structures of scRNA-seq data in the latent representation space, we introduce a local learning and self-expression strategy to guide model training. Experimental results across various scRNA-seq data sets demonstrate that the proposed scGANSL model significantly outperforms other state-of-the-art scRNA-seq data clustering methods.
单细胞RNA测序(scRNA-seq)已成为在单细胞水平分析细胞多样性的关键技术。细胞聚类在scRNA-seq数据分析中至关重要,因为它能准确识别不同的细胞类型并揭示潜在的亚群。然而,大多数现有的scRNA-seq方法依赖单一视角进行分析,导致对scRNA-seq数据的解释不完整。此外,scRNA-seq数据的高维度和不可避免的噪声给聚类任务带来了重大挑战。为应对这些挑战,在本研究中,我们引入了一种名为带子空间学习的图注意力网络(scGANSL)的新型聚类方法,用于scRNA-seq数据聚类。具体而言,所提出的scGANSL方法首先使用高变基因(HVG)筛选和主成分分析(PCA)构建两个视角。然后将它们分别输入到一个多视角共享图自动编码器中,其中聚类标签指导潜在表示和系数矩阵的学习。此外,所提出的方法将零膨胀负二项式(ZINB)模型集成到一个自监督图注意力自动编码器中,以更有效地学习潜在表示。为了在潜在表示空间中保留scRNA-seq数据的局部和全局结构,我们引入了一种局部学习和自表达策略来指导模型训练。跨各种scRNA-seq数据集的实验结果表明,所提出的scGANSL模型显著优于其他现有的scRNA-seq数据聚类方法。