School of Artificial Intelligence, Hebei University of Technology, Tianjin, China.
School of Artificial Intelligence, Jilin University, Jilin, China.
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored.
To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives.
The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序(scRNA-seq)是一种在单细胞水平上进行基因表达转录组分析的越来越流行的技术。细胞类型聚类是 scRNA-seq 数据分析的第一个关键任务,它有助于准确识别细胞类型并研究其转录本的特征。最近,已经开发了几种基于深度自动编码器和集成聚类的计算模型来分析 scRNA-seq 数据。然而,当前的深度自动编码器不足以学习 scRNA-seq 数据的潜在表示,并且从这些特征表示中获得共识分区仍然没有得到充分探索。
为了解决这个挑战,我们提出了一种通过具有二部图集成聚类的双去噪自动编码器的单细胞深度学习聚类模型 scBGEDA,用于识别单细胞转录组图谱中的特定细胞群体。首先,提出了一种单细胞双去噪自动编码器网络,将数据投影到一个压缩的低维空间中,并通过协同优化零膨胀负二项式重建损失和去噪重建损失的显式建模来学习特征表示。然后,设计了一个二部图集成聚类算法,通过基于图的共识函数来利用细胞之间的关系和学习到的潜在嵌入空间。使用多种聚类指标在来自不同测序平台的 20 个 scRNA-seq 数据集上进行了多项比较实验。实验结果表明,scBGEDA 在这些数据集上优于其他最先进的方法,并且还证明了其对大规模 scRNA-seq 数据集的可扩展性。此外,scBGEDA 能够识别细胞类型特异性标记基因,并通过量化基因对细胞簇的影响提供功能基因组分析,从而从不同角度对识别细胞类型和描述 scRNA-seq 数据提供新的见解。
scBGEDA 的源代码可在 https://github.com/wangyh082/scBGEDA 上获得。软件和支持数据可从 https://figshare.com/articles/software/scBGEDA/19657911 下载。
补充数据可在《生物信息学》在线获得。