Ballard Jenna L, Dai Zongyu, Shen Li, Long Qi
Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104, PA, USA.
Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, 19104, PA, USA.
bioRxiv. 2025 Jun 22:2025.06.16.659949. doi: 10.1101/2025.06.16.659949.
Integrative analysis of multi-omics data provides a more comprehensive and nuanced view of a subject's biological state. However, high-dimensionality and ubiquitous modality missingness present significant analytical challenges. Existing methods for incomplete multi-omics data are scarce, do not fully leverage both modality-specific and shared information, and produce task-biased representations. We propose JASMINE, a self-supervised representation learning method for incomplete multi-omics data that preserves both modality-specific and joint information and enhances sample similarity structure. JASMINE produces embeddings that achieve superior performance across multiple tasks for two different incomplete multi-omics datasets while requiring only a single round of training per dataset.
多组学数据的综合分析提供了对研究对象生物状态更全面、更细致入微的看法。然而,高维度和普遍存在的模态缺失带来了重大的分析挑战。现有针对不完整多组学数据的方法很少,没有充分利用特定模态和共享信息,并且产生任务偏差表示。我们提出了JASMINE,一种用于不完整多组学数据的自监督表示学习方法,它保留了特定模态和联合信息,并增强了样本相似性结构。JASMINE生成的嵌入在两个不同的不完整多组学数据集的多个任务中实现了卓越性能,同时每个数据集仅需一轮训练。