Qin Zhaohui, Li Ben, Conneely Karen N, Wu Hao, Hu Ming, Ayyala Deepak, Park Yongseok, Jin Victor X, Zhang Fangyuan, Zhang Han, Li Li, Lin Shili
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA.
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
Stat Biosci. 2016 Oct;8(2):284-309. doi: 10.1007/s12561-016-9145-0. Epub 2016 Mar 7.
With the rapid development of high throughput technologies such as array and next generation sequencing (NGS), genome-wide, nucleotide-resolution epigenomic data are increasingly available. In recent years, there has been particular interest in data on DNA methylation and 3-dimensional (3D) chromosomal organization, which are believed to hold keys to understand biological mechanisms, such as transcription regulation, that are closely linked to human health and diseases. However, small sample size, complicated correlation structure, substantial noise, biases, and uncertainties, all present difficulties for performing statistical inference. In this review, we present an overview of the new technologies that are frequently utilized in studying DNA methylation and 3D chromosomal organization. We focus on reviewing recent developments in statistical methodologies designed for better interrogating epigenomic data, pointing out statistical challenges facing the field whenever appropriate.
随着诸如基因芯片和新一代测序(NGS)等高通量技术的迅速发展,全基因组、核苷酸分辨率的表观基因组数据越来越容易获得。近年来,人们对DNA甲基化和三维(3D)染色体组织的数据特别感兴趣,这些数据被认为是理解与人类健康和疾病密切相关的生物学机制(如转录调控)的关键。然而,样本量小、复杂的相关结构、大量噪声、偏差和不确定性,都给进行统计推断带来了困难。在这篇综述中,我们概述了在研究DNA甲基化和3D染色体组织时经常使用的新技术。我们重点回顾了为更好地研究表观基因组数据而设计的统计方法的最新进展,并在适当的时候指出该领域面临的统计挑战。