组学大数据的综合分析

Integrative Analysis of Omics Big Data.

作者信息

Yu Xiang-Tian, Zeng Tao

机构信息

Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China.

出版信息

Methods Mol Biol. 2018;1754:109-135. doi: 10.1007/978-1-4939-7717-8_7.

DOI:10.1007/978-1-4939-7717-8_7

PMID:29536440

Abstract

The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.

摘要

多样性和海量组学数据将生物学和生物医学研究及应用带入了大数据时代，就如同十年前在人类社会流行的那样。它们正在开启从横向数据整合（例如从不同实验室或公司收集的相似类型数据）到纵向数据整合（例如为一组有匹配信息的人收集的不同类型数据）的新挑战，这既需要生物学和生物医学中的整合分析，也要求数据整合的紧急发展以应对从先前的群体导向到新的个体导向研究的巨大变化。数据整合是解决复杂问题或理解复杂系统的有效概念。几项基准研究揭示了组学数据分析中存在的异质性和权衡。整合分析可以以具有成本效益且可重复的方式组合和研究许多数据集。当前生物数据的整合方法有两种模式：一种是后续人工整合的“自下而上整合”模式，另一种是后续计算机模拟整合的“自上而下整合”模式。本文将首先总结组合分析方法，为基因组学的有效整合研究提供生物学实验设计的候选方案，然后调查数据融合方法，为生物学意义检测的计算模型开发提供有用指导，这些也为依赖大型生物医学数据的精准医学提供了新的数据资源和分析工具。最后，强调了组学大数据整合分析的问题和未来方向。