用于数据驱动的组学整合以实现多层生物学见解的算法和工具：一篇综述

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.

作者信息

Morabito Aurelia, De Simone Giulia, Pastorelli Roberta, Brunelli Laura, Ferrario Manuela

机构信息

Laboratory of Metabolites and Proteins in Translational Research, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, 20156, Milan, Italy.

Department of Electronics, Information and Bioengineering, Politecnico di Milano, 20133, Milan, Italy.

出版信息

J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.

DOI:10.1186/s12967-025-06446-x

PMID:40211300

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11987215/

Abstract

Systems biology is a holistic approach to biological sciences that combines experimental and computational strategies, aimed at integrating information from different scales of biological processes to unravel pathophysiological mechanisms and behaviours. In this scenario, high-throughput technologies have been playing a major role in providing huge amounts of omics data, whose integration would offer unprecedented possibilities in gaining insights on diseases and identifying potential biomarkers. In the present review, we focus on strategies that have been applied in literature to integrate genomics, transcriptomics, proteomics, and metabolomics in the year range 2018-2024. Integration approaches were divided into three main categories: statistical-based approaches, multivariate methods, and machine learning/artificial intelligence techniques. Among them, statistical approaches (mainly based on correlation) were the ones with a slightly higher prevalence, followed by multivariate approaches, and machine learning techniques. Integrating multiple biological layers has shown great potential in uncovering molecular mechanisms, identifying putative biomarkers, and aid classification, most of the time resulting in better performances when compared to single omics analyses. However, significant challenges remain. The high-throughput nature of omics platforms introduces issues such as variable data quality, missing values, collinearity, and dimensionality. These challenges further increase when combining multiple omics datasets, as the complexity and heterogeneity of the data increase with integration. We report different strategies that have been found in literature to cope with these challenges, but some open issues still remain and should be addressed to disclose the full potential of omics integration.

摘要

系统生物学是一种针对生物科学的整体研究方法，它结合了实验和计算策略，旨在整合来自生物过程不同尺度的信息，以揭示病理生理机制和行为。在这种情况下，高通量技术在提供大量组学数据方面发挥了重要作用，这些数据的整合将为深入了解疾病和识别潜在生物标志物提供前所未有的可能性。在本综述中，我们关注2018年至2024年期间文献中用于整合基因组学、转录组学、蛋白质组学和代谢组学的策略。整合方法主要分为三大类：基于统计的方法、多变量方法以及机器学习/人工智能技术。其中，基于统计的方法（主要基于相关性）的应用比例略高，其次是多变量方法和机器学习技术。整合多个生物层面在揭示分子机制、识别假定生物标志物和辅助分类方面显示出巨大潜力，大多数情况下与单一组学分析相比能产生更好的效果。然而，重大挑战依然存在。组学平台的高通量特性带来了诸如数据质量可变、缺失值、共线性和维度等问题。当整合多个组学数据集时，这些挑战会进一步加剧，因为数据的复杂性和异质性会随着整合而增加。我们报告了文献中发现的应对这些挑战的不同策略，但一些未解决的问题仍然存在，需要加以解决以充分发挥组学整合的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0291/11987215/d2cc510f17a5/12967_2025_6446_Fig1_HTML.jpg

相似文献

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.用于数据驱动的组学整合以实现多层生物学见解的算法和工具：一篇综述

J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.

A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology.多组学数据整合的机器学习技术综合综述：精准肿瘤学中的挑战与应用

Brief Funct Genomics. 2024 Sep 27;23(5):549-560. doi: 10.1093/bfgp/elae013.

Machine learning: its challenges and opportunities in plant system biology.机器学习：在植物系统生物学中的挑战与机遇。

Appl Microbiol Biotechnol. 2022 May;106(9-10):3507-3530. doi: 10.1007/s00253-022-11963-6. Epub 2022 May 16.

Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析：综述

Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.

Multi-omics integration in biomedical research - A metabolomics-centric review.多组学在生物医学研究中的整合——以代谢组学为中心的综述。

Anal Chim Acta. 2021 Jan 2;1141:144-162. doi: 10.1016/j.aca.2020.10.038. Epub 2020 Oct 22.

Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics.整合分子视角：转录组学、蛋白质组学和代谢组学中综合多组学整合数据分析及机器学习应用的策略

Biology (Basel). 2024 Oct 22;13(11):848. doi: 10.3390/biology13110848.

Integration strategies of multi-omics data for machine learning analysis.用于机器学习分析的多组学数据整合策略。

Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.

mixOmics: An R package for 'omics feature selection and multiple data integration.mixOmics：一个用于“组学”特征选择和多数据整合的R包。

PLoS Comput Biol. 2017 Nov 3;13(11):e1005752. doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.

Multi-omics based artificial intelligence for cancer research.基于多组学的人工智能在癌症研究中的应用。

Adv Cancer Res. 2024;163:303-356. doi: 10.1016/bs.acr.2024.06.005. Epub 2024 Jul 9.

An overview of technologies for MS-based proteomics-centric multi-omics.基于 MS 的蛋白质组学中心型多组学技术概述。

Expert Rev Proteomics. 2022 Mar;19(3):165-181. doi: 10.1080/14789450.2022.2070476. Epub 2022 May 2.

引用本文的文献

Artificial Intelligence in Assessing Reproductive Aging: Role of Mitochondria, Oxidative Stress, and Telomere Biology.人工智能在评估生殖衰老中的作用：线粒体、氧化应激和端粒生物学的作用

Diagnostics (Basel). 2025 Aug 19;15(16):2075. doi: 10.3390/diagnostics15162075.

Precision Neuro-Oncology in Glioblastoma: AI-Guided CRISPR Editing and Real-Time Multi-Omics for Genomic Brain Surgery.胶质母细胞瘤中的精准神经肿瘤学：用于基因组脑手术的人工智能引导的CRISPR编辑和实时多组学技术

Int J Mol Sci. 2025 Jul 30;26(15):7364. doi: 10.3390/ijms26157364.

Diagnostic relevance of Humanin, GAS5 and miR-21/miR-103 in prostate disease risk stratification.人胰岛素、生长停滞特异性转录本5（GAS5）以及miR-21/miR-103在前列腺疾病风险分层中的诊断相关性

Clin Exp Med. 2025 Aug 6;25(1):279. doi: 10.1007/s10238-025-01810-z.

Combating Root-Knot Nematodes ( spp.): From Molecular Mechanisms to Resistant Crops.对抗根结线虫（ spp.）：从分子机制到抗性作物

Plants (Basel). 2025 Apr 27;14(9):1321. doi: 10.3390/plants14091321.

本文引用的文献

Machine learning-based clustering identifies obesity subgroups with differential multi-omics profiles and metabolic patterns.基于机器学习的聚类分析确定了具有不同多组学特征和代谢模式的肥胖亚组。

Obesity (Silver Spring). 2024 Nov;32(11):2024-2034. doi: 10.1002/oby.24137.

Quantitative proteomics and multi-omics analysis identifies potential biomarkers and the underlying pathological molecular networks in Chinese patients with multiple sclerosis.定量蛋白质组学和多组学分析鉴定中国多发性硬化症患者潜在的生物标志物和潜在的病理分子网络。

BMC Neurol. 2024 Oct 31;24(1):423. doi: 10.1186/s12883-024-03926-3.

Metabolic remodeling in glioblastoma: a longitudinal multi-omics study.胶质母细胞瘤中的代谢重编程：一项纵向多组学研究。

Acta Neuropathol Commun. 2024 Oct 12;12(1):162. doi: 10.1186/s40478-024-01861-5.

Multi-omics analysis of aggregative multicellularity.聚集性多细胞性的多组学分析

iScience. 2024 Aug 3;27(9):110659. doi: 10.1016/j.isci.2024.110659. eCollection 2024 Sep 20.

A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.一篇关于高通量测序数据分析中特征选择和特征提取进展的综述。

Funct Integr Genomics. 2024 Aug 19;24(5):139. doi: 10.1007/s10142-024-01415-x.

DNA damage repair in megakaryopoiesis: molecular and clinical aspects.巨核细胞生成中的 DNA 损伤修复：分子与临床方面。

Expert Rev Hematol. 2024 Oct;17(10):705-712. doi: 10.1080/17474086.2024.2391102. Epub 2024 Aug 13.

Metabolomics and Multi-Omics Determination of Potential Plasma Biomarkers in PRV-1-Infected Atlantic Salmon.代谢组学和多组学技术测定感染PRV-1的大西洋鲑潜在血浆生物标志物

Metabolites. 2024 Jul 2;14(7):375. doi: 10.3390/metabo14070375.

Multi-omics analysis reveals drivers of loss of β-cell function after newly diagnosed autoimmune type 1 diabetes: An INNODIA multicenter study.多组学分析揭示了新诊断的自身免疫性 1 型糖尿病后β细胞功能丧失的驱动因素：一项 INNODIA 多中心研究。

Diabetes Metab Res Rev. 2024 Jul;40(5):e3833. doi: 10.1002/dmrr.3833.

Multi-omics analysis of diabetic pig lungs reveals molecular derangements underlying pulmonary complications of diabetes mellitus.多组学分析糖尿病猪肺，揭示糖尿病肺部并发症的分子失调。

Dis Model Mech. 2024 Jul 1;17(7). doi: 10.1242/dmm.050650. Epub 2024 Jul 23.

Brain high-throughput multi-omics data reveal molecular heterogeneity in Alzheimer's disease.大脑高通量多组学数据揭示阿尔茨海默病的分子异质性。

PLoS Biol. 2024 Apr 30;22(4):e3002607. doi: 10.1371/journal.pbio.3002607. eCollection 2024 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于数据驱动的组学整合以实现多层生物学见解的算法和工具：一篇综述

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献