Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Wiley Interdiscip Rev Syst Biol Med. 2013 Nov-Dec;5(6):677-86. doi: 10.1002/wsbm.1242. Epub 2013 Sep 9.
Systems biology approaches to epidemiological studies of complex diseases include collection of genetic, genomic, epigenomic, and metagenomic data in large-scale epidemiological studies of complex phenotypes. Designs and analyses of such studies raise many statistical challenges. This article reviews some issues related to integrative analysis of such high dimensional and inter-related datasets and outline some possible solutions. I focus my review on integrative approaches for genome-wide genetic variants and gene expression data, methods for joint analysis of genetic and epigenetic variants, and methods for analysis of microbiome data. Statistical methods such as mediation analysis, high-dimensional instrumental variable regression, sparse signal recovery, and compositional data regression provide potential frameworks for integrative analysis of these high-dimensional genomic data.
系统生物学方法在复杂疾病的流行病学研究中包括在复杂表型的大型流行病学研究中收集遗传、基因组、表观基因组和宏基因组数据。此类研究的设计和分析提出了许多统计挑战。本文回顾了一些与整合分析这些高维且相互关联的数据集相关的问题,并概述了一些可能的解决方案。我重点回顾了全基因组遗传变异和基因表达数据的整合分析方法、遗传和表观遗传变异的联合分析方法以及微生物组数据的分析方法。中介分析、高维工具变量回归、稀疏信号恢复和组成数据回归等统计方法为这些高维基因组数据的整合分析提供了潜在框架。