Adam Elodie, Zanoaga Mihaela-Diana, Rota Riccardo, Cominetti Ornella
Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research; D-USYS, ETH Zürich.
Statistical Genetics Group, University of Lausanne (UNIL).
J Vis Exp. 2025 Aug 8(222). doi: 10.3791/66995.
This manuscript provides a comprehensive step-by-step guide for integrating multi-omics data in biological research. Multi-omics data integration refers to the process of combining and analyzing data measured on the same set of biological samples with different omics technologies, such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, microbiomes, lipidomics, and glycomics. Even though multi-omics approaches have similar objectives as single-block or single-omics analyses (for instance, description, discrimination, classification, or prediction), they are able to capture a broader spectrum of molecular information, thus providing a deeper understanding of biological systems and their complex interactions. Indeed, the combination of multiple-omics datasets enables the improvement of prediction accuracy and yields more robust results, especially in cases where the number of available samples is limited. Moreover, thanks also to the most recent development of machine learning techniques, multi-omics analyses are nowadays suitable to uncover hidden patterns and complex phenomena arising among different biological compounds. The primary aim of this work is to present the full protocol that is commonly used in multi-omics studies, from the initial formulation of the problem to the tools useful for the biological interpretation of the results. The manuscript describes in detail the various methods of integrating multi-omics data, including concatenation-based (low-level), transformation-based (mid-level), and model-based (high-level) approaches, and highlights their limitations and advantages, along with the presentation of general visualization and diagnostic tools.
本手稿提供了一份全面的、循序渐进的指南,用于在生物学研究中整合多组学数据。多组学数据整合是指将使用不同组学技术(如基因组学、表观基因组学、转录组学、蛋白质组学、代谢组学、微生物组学、脂质组学和糖组学)对同一组生物样本进行测量的数据进行组合和分析的过程。尽管多组学方法与单模块或单组学分析有相似的目标(例如,描述、区分、分类或预测),但它们能够捕获更广泛的分子信息,从而更深入地了解生物系统及其复杂的相互作用。事实上,多组学数据集的组合能够提高预测准确性并产生更可靠的结果,尤其是在可用样本数量有限的情况下。此外,由于机器学习技术的最新发展,如今多组学分析适合揭示不同生物化合物之间出现的隐藏模式和复杂现象。这项工作的主要目的是展示多组学研究中常用的完整方案,从问题的最初提出到有助于对结果进行生物学解释的工具。该手稿详细描述了整合多组学数据的各种方法,包括基于串联(低级)、基于转换(中级)和基于模型(高级)的方法,并突出了它们的局限性和优点,同时还介绍了通用的可视化和诊断工具。