Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China.
Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518129, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac347.
Advances in single-cell RNA sequencing (scRNA-seq) technologies has provided an unprecedent opportunity for cell-type identification. As clustering is an effective strategy towards cell-type identification, various computational approaches have been proposed for clustering scRNA-seq data. Recently, with the emergence of cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), the cell surface expression of specific proteins and the RNA expression on the same cell can be captured, which provides more comprehensive information for cell analysis. However, existing single cell clustering algorithms are mainly designed for single-omic data, and have difficulties in handling multi-omics data with diverse characteristics efficiently. In this study, we propose a novel deep embedded multi-omics clustering with collaborative training (DEMOC) model to perform joint clustering on CITE-seq data. Our model can take into account the characteristics of transcriptomic and proteomic data, and make use of the consistent and complementary information provided by different data sources effectively. Experiment results on two real CITE-seq datasets demonstrate that our DEMOC model not only outperforms state-of-the-art single-omic clustering methods, but also achieves better and more stable performance than existing multi-omics clustering methods. We also apply our model on three scRNA-seq datasets to assess the performance of our model in rare cell-type identification, novel cell-subtype detection and cellular heterogeneity analysis. Experiment results illustrate the effectiveness of our model in discovering the underlying patterns of data.
单细胞 RNA 测序 (scRNA-seq) 技术的进步为细胞类型鉴定提供了前所未有的机会。由于聚类是细胞类型鉴定的有效策略,因此已经提出了各种用于聚类 scRNA-seq 数据的计算方法。最近,随着转录组和表位的细胞索引测序 (CITE-seq) 的出现,可以捕获特定蛋白质的细胞表面表达和同一细胞上的 RNA 表达,这为细胞分析提供了更全面的信息。然而,现有的单细胞聚类算法主要是为单组学数据设计的,难以有效地处理具有不同特征的多组学数据。在这项研究中,我们提出了一种新的深度嵌入式多组学聚类与协作训练 (DEMOC) 模型,以对 CITE-seq 数据进行联合聚类。我们的模型可以考虑转录组和蛋白质组数据的特征,并有效地利用不同数据源提供的一致和互补信息。在两个真实的 CITE-seq 数据集上的实验结果表明,我们的 DEMOC 模型不仅优于最先进的单组学聚类方法,而且比现有的多组学聚类方法具有更好和更稳定的性能。我们还将我们的模型应用于三个 scRNA-seq 数据集,以评估我们的模型在稀有细胞类型鉴定、新细胞亚型检测和细胞异质性分析中的性能。实验结果说明了我们的模型在发现数据潜在模式方面的有效性。