Suppr超能文献

用于数据驱动的组学整合以实现多层生物学见解的算法和工具:一篇综述

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.

作者信息

Morabito Aurelia, De Simone Giulia, Pastorelli Roberta, Brunelli Laura, Ferrario Manuela

机构信息

Laboratory of Metabolites and Proteins in Translational Research, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, 20156, Milan, Italy.

Department of Electronics, Information and Bioengineering, Politecnico di Milano, 20133, Milan, Italy.

出版信息

J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.

Abstract

Systems biology is a holistic approach to biological sciences that combines experimental and computational strategies, aimed at integrating information from different scales of biological processes to unravel pathophysiological mechanisms and behaviours. In this scenario, high-throughput technologies have been playing a major role in providing huge amounts of omics data, whose integration would offer unprecedented possibilities in gaining insights on diseases and identifying potential biomarkers. In the present review, we focus on strategies that have been applied in literature to integrate genomics, transcriptomics, proteomics, and metabolomics in the year range 2018-2024. Integration approaches were divided into three main categories: statistical-based approaches, multivariate methods, and machine learning/artificial intelligence techniques. Among them, statistical approaches (mainly based on correlation) were the ones with a slightly higher prevalence, followed by multivariate approaches, and machine learning techniques. Integrating multiple biological layers has shown great potential in uncovering molecular mechanisms, identifying putative biomarkers, and aid classification, most of the time resulting in better performances when compared to single omics analyses. However, significant challenges remain. The high-throughput nature of omics platforms introduces issues such as variable data quality, missing values, collinearity, and dimensionality. These challenges further increase when combining multiple omics datasets, as the complexity and heterogeneity of the data increase with integration. We report different strategies that have been found in literature to cope with these challenges, but some open issues still remain and should be addressed to disclose the full potential of omics integration.

摘要

系统生物学是一种针对生物科学的整体研究方法,它结合了实验和计算策略,旨在整合来自生物过程不同尺度的信息,以揭示病理生理机制和行为。在这种情况下,高通量技术在提供大量组学数据方面发挥了重要作用,这些数据的整合将为深入了解疾病和识别潜在生物标志物提供前所未有的可能性。在本综述中,我们关注2018年至2024年期间文献中用于整合基因组学、转录组学、蛋白质组学和代谢组学的策略。整合方法主要分为三大类:基于统计的方法、多变量方法以及机器学习/人工智能技术。其中,基于统计的方法(主要基于相关性)的应用比例略高,其次是多变量方法和机器学习技术。整合多个生物层面在揭示分子机制、识别假定生物标志物和辅助分类方面显示出巨大潜力,大多数情况下与单一组学分析相比能产生更好的效果。然而,重大挑战依然存在。组学平台的高通量特性带来了诸如数据质量可变、缺失值、共线性和维度等问题。当整合多个组学数据集时,这些挑战会进一步加剧,因为数据的复杂性和异质性会随着整合而增加。我们报告了文献中发现的应对这些挑战的不同策略,但一些未解决的问题仍然存在,需要加以解决以充分发挥组学整合的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0291/11987215/d2cc510f17a5/12967_2025_6446_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验