多组学数据整合方法的技术综述：从经典统计方法到深度生成方法

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.

作者信息

Baião Ana R, Cai Zhaoxiang, Poulos Rebecca C, Robinson Phillip J, Reddel Roger R, Zhong Qing, Vinga Susana, Gonçalves Emanuel

机构信息

INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, Portugal.

Instituto Superior Técnico (IST), Universidade de Lisboa, Av. Rovisco Pais, 1049-001 Lisboa, Portugal.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.

DOI:10.1093/bib/bbaf355

PMID:40748323

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12315550/

Abstract

The rapid advancement of high-throughput sequencing and other assay technologies has resulted in the generation of large and complex multi-omics datasets, offering unprecedented opportunities for advancing precision medicine. However, multi-omics data integration remains challenging due to the high-dimensionality, heterogeneity, and frequency of missing values across data types. Computational methods leveraging statistical and machine learning approaches have been developed to address these issues and uncover complex biological patterns, improving our understanding of disease mechanisms. Here, we comprehensively review state-of-the-art multi-omics integration methods with a focus on deep generative models, particularly variational autoencoders (VAEs) that have been widely used for data imputation, augmentation, and batch effect correction. We explore the technical aspects of VAE loss functions and regularisation techniques, including adversarial training, disentanglement, and contrastive learning. Moreover, we highlight recent advancements in foundation models and multimodal data integration, outlining future directions in precision medicine research.

摘要

高通量测序和其他检测技术的快速发展，已产生了大量复杂的多组学数据集，为推进精准医学提供了前所未有的机遇。然而，由于数据类型之间存在高维度、异质性和缺失值频率等问题，多组学数据整合仍然具有挑战性。利用统计和机器学习方法的计算方法已被开发出来，以解决这些问题并揭示复杂的生物学模式，增进我们对疾病机制的理解。在此，我们全面综述了最先进的多组学整合方法，重点关注深度生成模型，特别是变分自编码器（VAE），它已被广泛用于数据插补、增强和批次效应校正。我们探讨了VAE损失函数和正则化技术的技术方面，包括对抗训练、解缠和对比学习。此外，我们强调了基础模型和多模态数据整合方面的最新进展，概述了精准医学研究的未来方向。