Yuan Lin, Xu Zhijie, Meng Boyuan, Ye Lan
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China.
Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China.
BMC Genomics. 2025 Apr 7;26(1):350. doi: 10.1186/s12864-025-11511-2.
Clustering scRNA-seq data plays a vital role in scRNA-seq data analysis and downstream analyses. Many computational methods have been proposed and achieved remarkable results. However, there are several limitations of these methods. First, they do not fully exploit cellular features. Second, they are developed based on gene expression information and lack of flexibility in integrating intercellular relationships. Finally, the performance of these methods is affected by dropout event.
We propose a novel deep learning (DL) model based on attention autoencoder and zero-inflated (ZI) layer, namely scAMZI, to cluster scRNA-seq data. scAMZI is mainly composed of SimAM (a Simple, parameter-free Attention Module), autoencoder, ZINB (Zero-Inflated Negative Binomial) model and ZI layer. Based on ZINB model, we introduce autoencoder and SimAM to reduce dimensionality of data and learn feature representations of cells and relationships between cells. Meanwhile, ZI layer is used to handle zero values in the data. We compare the performance of scAMZI with nine methods (three shallow learning algorithms and six state-of-the-art DL-based methods) on fourteen benchmark scRNA-seq datasets of various sizes (from hundreds to tens of thousands of cells) with known cell types. Experimental results demonstrate that scAMZI outperforms competing methods.
scAMZI outperforms competing methods and can facilitate downstream analyses such as cell annotation, marker gene discovery, and cell trajectory inference. The package of scAMZI is made freely available at https://doi.org/10.5281/zenodo.13131559 .
对单细胞RNA测序(scRNA-seq)数据进行聚类在scRNA-seq数据分析及下游分析中起着至关重要的作用。已经提出了许多计算方法并取得了显著成果。然而,这些方法存在一些局限性。首先,它们没有充分利用细胞特征。其次,它们是基于基因表达信息开发的,在整合细胞间关系方面缺乏灵活性。最后,这些方法的性能会受到数据丢失事件的影响。
我们提出了一种基于注意力自动编码器和零膨胀(ZI)层的新型深度学习(DL)模型,即scAMZI,用于对scRNA-seq数据进行聚类。scAMZI主要由SimAM(一种简单的、无参数的注意力模块)、自动编码器、零膨胀负二项分布(ZINB)模型和ZI层组成。基于ZINB模型,我们引入自动编码器和SimAM来降低数据维度,并学习细胞的特征表示以及细胞间的关系。同时,ZI层用于处理数据中的零值。我们在14个具有已知细胞类型的不同大小(从数百个到数万个细胞)的基准scRNA-seq数据集上,将scAMZI的性能与9种方法(3种浅层学习算法和6种基于深度学习的先进方法)进行了比较。实验结果表明,scAMZI优于其他竞争方法。
scAMZI优于其他竞争方法,可促进细胞注释、标记基因发现和细胞轨迹推断等下游分析。scAMZI软件包可在https://doi.org/10.5281/zenodo.13131559上免费获取。