Zitnik Marinka, Nguyen Francis, Wang Bo, Leskovec Jure, Goldenberg Anna, Hoffman Michael M
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
Inf Fusion. 2019 Oct;50:71-91. doi: 10.1016/j.inffus.2018.09.012. Epub 2018 Sep 21.
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
新技术使人们能够以前所未有的规模和多维度方式研究生物学和人类健康。这些维度包括描述基因组、表观基因组、转录组、微生物组、表型和生活方式的无数特性。然而,没有单一的数据类型能够捕捉与理解诸如疾病等现象相关的所有因素的复杂性。因此,结合多种技术数据的整合方法已成为关键的统计和计算方法。开发此类方法的关键挑战在于识别有效的模型,以提供全面且相关的系统观点。一种理想的方法可以通过利用生物变异多个维度的异构数据来回答生物学或医学问题,识别重要特征并预测结果。在本综述中,我们描述了数据整合的原则,并讨论了当前的方法和可用的实现方式。我们提供了生物学和医学中成功数据整合的示例。最后,我们讨论了生物医学整合方法当前面临的挑战以及我们对该领域未来发展的看法。