Ellis Dorothy, Roy Arkaprava, Datta Susmita
Department of Biostatistics, University of Florida, Gainesville, FL, United States.
Front Genet. 2023 Jun 9;14:1179439. doi: 10.3389/fgene.2023.1179439. eCollection 2023.
The development of multimodal single-cell omics methods has enabled the collection of data across different omics modalities from the same set of single cells. Each omics modality provides unique information about cell type and function, so the ability to integrate data from different modalities can provide deeper insights into cellular functions. Often, single-cell omics data can prove challenging to model because of high dimensionality, sparsity, and technical noise. We propose a novel multimodal data analysis method called oint graph-egularized ngle-ell ullback-eibler parse on-negative atrix actorization (jrSiCKLSNMF, pronounced "junior sickles NMF") that extracts latent factors shared across omics modalities within the same set of single cells. We compare our clustering algorithm to several existing methods on four sets of data simulated from third party software. We also apply our algorithm to a real set of cell line data. We show overwhelmingly better clustering performance than several existing methods on the simulated data. On a real multimodal omics dataset, we also find our method to produce scientifically accurate clustering results.
多模态单细胞组学方法的发展使得能够从同一组单细胞中收集不同组学模态的数据。每种组学模态都提供了关于细胞类型和功能的独特信息,因此整合来自不同模态的数据的能力可以提供对细胞功能更深入的见解。通常,由于高维度、稀疏性和技术噪声,单细胞组学数据可能难以建模。我们提出了一种新颖的多模态数据分析方法,称为联合图正则化单椭圆全变差非负矩阵分解(jrSiCKLSNMF,发音为“junior sickles NMF”),该方法可提取同一组单细胞中跨组学模态共享的潜在因子。我们将我们的聚类算法与从第三方软件模拟的四组数据上的几种现有方法进行了比较。我们还将我们的算法应用于一组真实的细胞系数据。在模拟数据上,我们展示出比几种现有方法好得多的聚类性能。在一个真实的多模态组学数据集上,我们也发现我们的方法能产生科学准确的聚类结果。