College of Chemistry, Sichuan University, Chengdu, Sichuan, China.
West China Biomedical Big Data Center, West China Hospital, Sichuan University, Sichuan, China.
PLoS Comput Biol. 2023 Nov 10;19(11):e1011641. doi: 10.1371/journal.pcbi.1011641. eCollection 2023 Nov.
Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes.
单细胞测序 (scRNA-seq) 技术提供了比批量 RNA 测序更高的细胞差异分辨率,并揭示了生物研究中的异质性。scRNA-seq 数据集的分析基于亚群分配。当没有适当的参考时,例如特定的标记基因和单细胞参考图谱,无监督聚类方法成为主要选择。然而,scRNA-seq 数据集的固有稀疏性和高维性对传统聚类方法提出了特定的分析挑战。因此,已经提出了各种基于深度学习的方法来解决这些挑战。由于每种方法都有所改进,因此需要提出一种综合方法。在本文中,我们提出了一种名为 AttentionAE-sc(用于单细胞的注意力融合自动编码器)的新型 scRNA-seq 数据聚类方法。通过注意力机制将两种不同的 scRNA-seq 聚类策略结合在一起,包括处理辍学事件影响的零膨胀负二项式 (ZINB) 方法和依赖于邻居信息来指导降维的图自动编码器 (GAE) 方法。基于去噪和拓扑嵌入之间的迭代融合,AttentionAE-sc 可以轻松获取聚类友好的细胞表示,相似的细胞在隐藏嵌入中更接近。与几种最先进的基线方法相比,AttentionAE-sc 在 16 个真实的 scRNA-seq 数据集上表现出出色的聚类性能,而无需指定组的数量。此外,AttentionAE-sc 学习了改进的细胞表示,并表现出增强的稳定性和鲁棒性。此外,AttentionAE-sc 在乳腺癌单细胞图谱数据集上实现了显著的识别,并为不同细胞亚型之间的异质性提供了有价值的见解。