Morabito Aurelia, De Simone Giulia, Pastorelli Roberta, Brunelli Laura, Ferrario Manuela
Laboratory of Metabolites and Proteins in Translational Research, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, 20156, Milan, Italy.
Department of Electronics, Information and Bioengineering, Politecnico di Milano, 20133, Milan, Italy.
J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.
Systems biology is a holistic approach to biological sciences that combines experimental and computational strategies, aimed at integrating information from different scales of biological processes to unravel pathophysiological mechanisms and behaviours. In this scenario, high-throughput technologies have been playing a major role in providing huge amounts of omics data, whose integration would offer unprecedented possibilities in gaining insights on diseases and identifying potential biomarkers. In the present review, we focus on strategies that have been applied in literature to integrate genomics, transcriptomics, proteomics, and metabolomics in the year range 2018-2024. Integration approaches were divided into three main categories: statistical-based approaches, multivariate methods, and machine learning/artificial intelligence techniques. Among them, statistical approaches (mainly based on correlation) were the ones with a slightly higher prevalence, followed by multivariate approaches, and machine learning techniques. Integrating multiple biological layers has shown great potential in uncovering molecular mechanisms, identifying putative biomarkers, and aid classification, most of the time resulting in better performances when compared to single omics analyses. However, significant challenges remain. The high-throughput nature of omics platforms introduces issues such as variable data quality, missing values, collinearity, and dimensionality. These challenges further increase when combining multiple omics datasets, as the complexity and heterogeneity of the data increase with integration. We report different strategies that have been found in literature to cope with these challenges, but some open issues still remain and should be addressed to disclose the full potential of omics integration.
系统生物学是一种针对生物科学的整体研究方法,它结合了实验和计算策略,旨在整合来自生物过程不同尺度的信息,以揭示病理生理机制和行为。在这种情况下,高通量技术在提供大量组学数据方面发挥了重要作用,这些数据的整合将为深入了解疾病和识别潜在生物标志物提供前所未有的可能性。在本综述中,我们关注2018年至2024年期间文献中用于整合基因组学、转录组学、蛋白质组学和代谢组学的策略。整合方法主要分为三大类:基于统计的方法、多变量方法以及机器学习/人工智能技术。其中,基于统计的方法(主要基于相关性)的应用比例略高,其次是多变量方法和机器学习技术。整合多个生物层面在揭示分子机制、识别假定生物标志物和辅助分类方面显示出巨大潜力,大多数情况下与单一组学分析相比能产生更好的效果。然而,重大挑战依然存在。组学平台的高通量特性带来了诸如数据质量可变、缺失值、共线性和维度等问题。当整合多个组学数据集时,这些挑战会进一步加剧,因为数据的复杂性和异质性会随着整合而增加。我们报告了文献中发现的应对这些挑战的不同策略,但一些未解决的问题仍然存在,需要加以解决以充分发挥组学整合的潜力。