Wang Jing, Xia Junfeng, Wang Haiyun, Su Yansen, Zheng Chun-Hou
Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China.
Institutes of Physical Science and Information Technology, Anhui University, Hefei, China.
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac625.
The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.
单细胞核糖核酸测序(scRNA-seq)技术的进步使研究人员能够在细胞分辨率水平上探索细胞异质性和人类疾病。细胞聚类是scRNA-seq分析的一个先决条件,因为它可以识别细胞类型。然而,scRNA-seq数据的高维度、噪声和显著的稀疏性使其成为一个巨大的挑战。尽管已经出现了许多方法,但它们仍然未能充分探索细胞的内在特性以及细胞之间的关系,这严重影响了下游的聚类性能。在此,我们提出了一种新的深度对比聚类算法,称为scDCCA。它将去噪自动编码器和双对比学习模块集成到一个深度聚类框架中,以提取有价值的特征并实现细胞聚类。具体而言,为了更好地表征和稳健地学习数据表示,scDCCA利用基于去噪零膨胀负二项模型的自动编码器来提取低维特征。同时,scDCCA纳入了一个双对比学习模块来捕捉细胞之间的成对接近度。通过增加正样本对之间的相似性和负样本对之间的差异,实例级和聚类级的对比有助于模型学习更具判别性的特征并实现更好的细胞分离。此外,scDCCA将特征学习与聚类相结合,以端到端的方式实现表示学习和细胞聚类。14个真实数据集的实验结果验证了scDCCA在准确性、泛化性、可扩展性和效率方面优于8种最先进的方法。细胞可视化和生物学分析表明,scDCCA显著改善了聚类效果,并促进了scRNA-seq数据的下游分析。代码可在https://github.com/WJ319/scDCCA获取。