基于深度学习的多组学数据整合与分析方法。

Deep learning-based approaches for multi-omics data integration and analysis.

作者信息

Ballard Jenna L, Wang Zexuan, Li Wenrui, Shen Li, Long Qi

机构信息

Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.

Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, PA, 19104, USA.

出版信息

BioData Min. 2024 Oct 2;17(1):38. doi: 10.1186/s13040-024-00391-z.

DOI:10.1186/s13040-024-00391-z

PMID:39358793

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11446004/

Abstract

BACKGROUND

The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration.

METHOD

In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration.

RESULTS

Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data.

CONCLUSION

We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.

摘要

背景

深度学习的快速发展，以及海量且不断增长的可用数据，为复杂和异构数据类型的融合与分析取得进展提供了充足的机会。不同的数据模态提供互补信息，可借此更全面地了解每个研究对象。在生物医学领域，多组学数据包括分子（基因组学、转录组学、蛋白质组学、表观基因组学、代谢组学等）和成像（放射组学、病理组学）模态，将它们结合起来有潜力提高预测、分类、聚类及其他任务的性能。深度学习涵盖多种方法，每种方法在多组学整合方面都有一定的优缺点。

方法

在本综述中，我们根据其基本架构对近期基于深度学习的方法进行分类，并讨论它们彼此相关的独特能力。我们还讨论了推动多组学整合领域发展的一些新趋势。

结果

基于深度学习的多组学整合方法大致分为非生成式（前馈神经网络、图卷积神经网络和自动编码器）和生成式（变分方法、生成对抗模型和生成式预训练模型）。生成式方法的优势在于能够对共享表示施加约束，以强制实现某些属性或纳入先验知识。它们还可用于生成或插补缺失的模态。这些方法最近取得的进展包括处理不完整数据的能力，以及超越传统分子组学数据类型以整合其他模态（如图像数据）的能力。

结论

我们预计能够处理缺失值的方法会进一步发展，因为这是处理复杂和异构数据时的一个常见挑战。此外，整合更多数据类型的方法有望通过全面了解每个样本，提高下游任务的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f76f/11446004/544d2fa4fcb9/13040_2024_391_Fig1_HTML.jpg

相似文献

Deep learning-based approaches for multi-omics data integration and analysis.基于深度学习的多组学数据整合与分析方法。

BioData Min. 2024 Oct 2;17(1):38. doi: 10.1186/s13040-024-00391-z.

Deep Learning Methods for Omics Data Imputation.用于组学数据插补的深度学习方法。

Biology (Basel). 2023 Oct 7;12(10):1313. doi: 10.3390/biology12101313.

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).使用多视图因子分解自动编码器（MAE）将多组学数据与生物相互作用网络集成。

BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.

MOSDNET: A multi-omics classification framework using simplified multi-view deep discriminant representation learning and dynamic edge GCN with multi-task learning.MOSDNET：一种基于简化多视图深度判别表示学习和具有多任务学习的动态边缘图卷积神经网络的多组学分类框架。

Comput Biol Med. 2024 Oct;181:109040. doi: 10.1016/j.compbiomed.2024.109040. Epub 2024 Aug 20.

The performance of deep generative models for learning joint embeddings of single-cell multi-omics data.用于学习单细胞多组学数据联合嵌入的深度生成模型的性能。

Front Mol Biosci. 2022 Oct 26;9:962644. doi: 10.3389/fmolb.2022.962644. eCollection 2022.

DeepMoIC: multi-omics data integration via deep graph convolutional networks for cancer subtype classification.DeepMoIC：通过深度图卷积网络进行多组学数据整合以实现癌症亚型分类

BMC Genomics. 2024 Dec 18;25(1):1209. doi: 10.1186/s12864-024-11112-5.

Amogel: a multi-omics classification framework using associative graph neural networks with prior knowledge for biomarker identification.Amogel：一种使用具有先验知识的关联图神经网络进行生物标志物识别的多组学分类框架。

BMC Bioinformatics. 2025 Mar 28;26(1):94. doi: 10.1186/s12859-025-06111-6.

Multimodal deep learning approaches for single-cell multi-omics data integration.多模态深度学习方法在单细胞多组学数据整合中的应用。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad313.

Integration strategies of multi-omics data for machine learning analysis.用于机器学习分析的多组学数据整合策略。

Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.

Combining Neuroimaging and Omics Datasets for Disease Classification Using Graph Neural Networks.使用图神经网络结合神经影像和组学数据集进行疾病分类

Front Neurosci. 2022 May 23;16:866666. doi: 10.3389/fnins.2022.866666. eCollection 2022.

引用本文的文献

Cutting-edge technologies in neural regeneration.神经再生领域的前沿技术。

Cell Regen. 2025 Sep 5;14(1):38. doi: 10.1186/s13619-025-00260-y.

Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

Editorial: Interactions at the viral-host nexus in animals: from omics insights to immune modulation.社论：动物病毒-宿主关系中的相互作用：从组学见解到免疫调节

Front Cell Infect Microbiol. 2025 Aug 15;15:1667802. doi: 10.3389/fcimb.2025.1667802. eCollection 2025.

Artificial Intelligence and Multi-Omics in Pharmacogenomics: A New Era of Precision Medicine.药物基因组学中的人工智能与多组学：精准医学的新时代。

Mayo Clin Proc Digit Health. 2025 Jun 26;3(3):100246. doi: 10.1016/j.mcpdig.2025.100246. eCollection 2025 Sep.

Combination Strategies with HSP90 Inhibitors in Cancer Therapy: Mechanisms, Challenges, and Future Perspectives.癌症治疗中HSP90抑制剂的联合策略：作用机制、挑战与未来展望

Pharmaceuticals (Basel). 2025 Jul 22;18(8):1083. doi: 10.3390/ph18081083.

Feature Selection Strategies for Deep Learning-Based Classification in Ultra-High-Dimensional Genomic Data.超高维基因组数据中基于深度学习分类的特征选择策略

Int J Mol Sci. 2025 Aug 18;26(16):7961. doi: 10.3390/ijms26167961.

Mitochondrial Metabolomics in Cancer: Mass Spectrometry-Based Approaches for Metabolic Rewiring Analysis and Therapeutic Discovery.癌症中的线粒体代谢组学：基于质谱的代谢重编程分析及治疗发现方法

Metabolites. 2025 Jul 31;15(8):513. doi: 10.3390/metabo15080513.

Self-Normalizing Multi-Omics Neural Network for Pan-Cancer Prognostication.用于泛癌预后预测的自归一化多组学神经网络

Int J Mol Sci. 2025 Jul 30;26(15):7358. doi: 10.3390/ijms26157358.

Mapping the future: bibliometric analysis of omics research trends in non-small cell lung cancer.绘制未来蓝图：非小细胞肺癌组学研究趋势的文献计量分析

Discov Oncol. 2025 Aug 12;16(1):1536. doi: 10.1007/s12672-025-03140-8.

Longitudinal big biological data in the AI era.人工智能时代的纵向大型生物数据。

Mol Syst Biol. 2025 Aug 5. doi: 10.1038/s44320-025-00134-0.

本文引用的文献

scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT：迈向使用生成式人工智能构建单细胞多组学基础模型

Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.

DeepAutoGlioma: a deep learning autoencoder-based multi-omics data integration and classification tools for glioma subtyping.深度自动胶质瘤分类器：一种基于深度学习自动编码器的多组学数据集成与胶质瘤亚型分类工具。

BioData Min. 2023 Nov 15;16(1):32. doi: 10.1186/s13040-023-00349-7.

MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning.MOCSS：通过共享和特定表示学习进行多组学数据聚类与癌症亚型分析

iScience. 2023 Jul 13;26(8):107378. doi: 10.1016/j.isci.2023.107378. eCollection 2023 Aug 18.

A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment.通过深度学习方法进行多组学数据整合以用于疾病诊断、预后和治疗的综述。

Front Genet. 2023 Jul 20;14:1199087. doi: 10.3389/fgene.2023.1199087. eCollection 2023.

Attention-based GCN integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification.基于注意力的 GCN 集成了多组学数据，用于乳腺癌亚型分类和患者特异性基因标志物识别。

Brief Funct Genomics. 2023 Nov 10;22(5):463-474. doi: 10.1093/bfgp/elad013.

Missing data in multi-omics integration: Recent advances through artificial intelligence.多组学整合中的缺失数据：通过人工智能取得的最新进展

Front Artif Intell. 2023 Feb 9;6:1098308. doi: 10.3389/frai.2023.1098308. eCollection 2023.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

Multi-modal sequence learning for Alzheimer's disease progression prediction with incomplete variable-length longitudinal data.多模态序列学习在不完全变量长度纵向数据下的阿尔茨海默病进展预测。

Med Image Anal. 2022 Nov;82:102643. doi: 10.1016/j.media.2022.102643. Epub 2022 Sep 28.

Integrating Radiomics with Genomics for Non-Small Cell Lung Cancer Survival Analysis.整合放射组学与基因组学用于非小细胞肺癌生存分析

J Oncol. 2022 Aug 27;2022:5131170. doi: 10.1155/2022/5131170. eCollection 2022.

Integrated multi-omics reveal polycomb repressive complex 2 restricts human trophoblast induction.整合多组学揭示 Polycomb 抑制复合物 2 限制人滋养层诱导。

Nat Cell Biol. 2022 Jun;24(6):858-871. doi: 10.1038/s41556-022-00932-w. Epub 2022 Jun 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于深度学习的多组学数据整合与分析方法。

Deep learning-based approaches for multi-omics data integration and analysis.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献