Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
Anal Chim Acta. 2021 Jan 2;1141:144-162. doi: 10.1016/j.aca.2020.10.038. Epub 2020 Oct 22.
Recent advances in high-throughput technologies have enabled the profiling of multiple layers of a biological system, including DNA sequence data (genomics), RNA expression levels (transcriptomics), and metabolite levels (metabolomics). This has led to the generation of vast amounts of biological data that can be integrated in so-called multi-omics studies to examine the complex molecular underpinnings of health and disease. Integrative analysis of such datasets is not straightforward and is particularly complicated by the high dimensionality and heterogeneity of the data and by the lack of universal analysis protocols. Previous reviews have discussed various strategies to address the challenges of data integration, elaborating on specific aspects, such as network inference or feature selection techniques. Thereby, the main focus has been on the integration of two omics layers in their relation to a phenotype of interest. In this review we provide an overview over a typical multi-omics workflow, focusing on integration methods that have the potential to combine metabolomics data with two or more omics. We discuss multiple integration concepts including data-driven, knowledge-based, simultaneous and step-wise approaches. We highlight the application of these methods in recent multi-omics studies, including large-scale integration efforts aiming at a global depiction of the complex relationships within and between different biological layers without focusing on a particular phenotype.
近年来,高通量技术的进步使得对生物系统的多个层次进行分析成为可能,包括 DNA 序列数据(基因组学)、RNA 表达水平(转录组学)和代谢物水平(代谢组学)。这导致了大量生物数据的产生,可以在所谓的多组学研究中进行整合,以研究健康和疾病的复杂分子基础。这种数据集的综合分析并不简单,特别是由于数据的高维性和异质性以及缺乏通用的分析协议而变得更加复杂。以前的综述讨论了各种策略来解决数据集成的挑战,详细阐述了特定方面,如网络推断或特征选择技术。因此,主要重点是将两个组学层与其感兴趣的表型相关联进行整合。在这篇综述中,我们提供了一个典型的多组学工作流程概述,重点介绍了有可能将代谢组学数据与两个或更多组学相结合的集成方法。我们讨论了多种集成概念,包括数据驱动、基于知识、同时和逐步的方法。我们强调了这些方法在最近的多组学研究中的应用,包括旨在全面描述不同生物层内部和之间复杂关系的大型整合工作,而不关注特定的表型。