School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad475.
Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).
单细胞测序技术为全面破译细胞异质性提供了前所未有的机会。然而,细胞异质性的高维性和复杂性给计算方法带来了巨大的挑战。已经提出了许多新的聚类方法来解决这个问题。然而,这些方法在不同的生物场景下都没有达到一致的更好的性能。在这项研究中,我们开发了 CAKE,一种新颖的、可扩展的自监督聚类方法,它由一个具有混合邻域增强的对比学习模型和一个用于聚类结果细化的自知识蒸馏模型组成。这些设计提供了更紧凑和聚类友好的细胞表示,并提高了聚类性能的准确性和鲁棒性。此外,除了准确识别主要类型的细胞外,CAKE 还可以发现更多具有生物学意义的细胞亚群和罕见的细胞类型。在真实的单细胞 RNA 测序数据集上的综合实验表明,CAKE 在可视化和聚类方面优于其他比较方法,并表明其在细胞异质性分析领域的广泛应用。联系信息:郑瑞庆。(rqzheng@csu.edu.cn)。