IEEE J Biomed Health Inform. 2023 Dec;27(12):6121-6132. doi: 10.1109/JBHI.2023.3317272. Epub 2023 Dec 5.
Cell type identification is a crucial step towards the study of cellular heterogeneity and biological processes. Advances in single-cell sequencing technology have enabled the development of a variety of clustering methods for cell type identification. However, most of existing methods are designed for clustering single omic data such as single-cell RNA-sequencing (scRNA-seq) data. The accumulation of single-cell multi-omics data provides a great opportunity to integrate different omics data for cell clustering, but also raise new computational challenges for existing methods. How to integrate multi-omics data and leverage their consensus and complementary information to improve the accuracy of cell clustering still remains a challenge. In this study, we propose a new deep multi-level information fusion framework, named scMIC, for clustering single-cell multi-omics data. Our model can integrate the attribute information of cells and the potential structural relationship among cells from local and global levels, and reduce redundant information between different omics from cell and feature levels, leading to more discriminative representations. Moreover, the proposed multiple collaborative supervised clustering strategy is able to guide the learning process of the core encoding part by learning the high-confidence target distribution, which facilitates the interaction between the clustering part and the representation learning part, as well as the information exchange between omics, and finally obtain more robust clustering results. Experiments on seven single-cell multi-omics datasets show the superiority of scMIC over existing state-of-the-art methods.
细胞类型鉴定是研究细胞异质性和生物学过程的关键步骤。单细胞测序技术的进步使得各种细胞类型鉴定的聚类方法得以发展。然而,现有的大多数方法都是为聚类单组学数据而设计的,如单细胞 RNA 测序 (scRNA-seq) 数据。单细胞多组学数据的积累为整合不同的组学数据进行细胞聚类提供了很好的机会,但也给现有方法带来了新的计算挑战。如何整合多组学数据,并利用其一致性和互补信息来提高细胞聚类的准确性仍然是一个挑战。在这项研究中,我们提出了一个新的深度多层次信息融合框架,名为 scMIC,用于聚类单细胞多组学数据。我们的模型可以从局部和全局水平上整合细胞的属性信息和细胞之间潜在的结构关系,并从细胞和特征水平上减少不同组学之间的冗余信息,从而得到更具判别力的表示。此外,所提出的多个协同监督聚类策略能够通过学习高置信度目标分布来指导核心编码部分的学习过程,从而促进聚类部分和表示学习部分之间的相互作用,以及组学之间的信息交换,最终得到更稳健的聚类结果。在七个单细胞多组学数据集上的实验表明,scMIC 优于现有的最先进方法。