Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA.
Stat Med. 2013 Jun 15;32(13):2292-307. doi: 10.1002/sim.5658. Epub 2012 Oct 25.
Epigenetics is the study of changes to the genome that can switch genes on or off and determine which proteins are transcribed without altering the DNA sequence. Recently, epigenetic changes have been linked to the development and progression of disease such as psychiatric disorders. High-throughput epigenetic experiments have enabled researchers to measure genome-wide epigenetic profiles and yield data consisting of intensity ratios of immunoprecipitation versus reference samples. The intensity ratios can provide a view of genomic regions where protein binding occur under one experimental condition and further allow us to detect epigenetic alterations through comparison between two different conditions. However, such experiments can be expensive, with only a few replicates available. Moreover, epigenetic data are often spatially correlated with high noise levels. In this paper, we develop a Bayesian hierarchical model, combined with hidden Markov processes with four states for modeling spatial dependence, to detect genomic sites with epigenetic changes from two-sample experiments with paired internal control. One attractive feature of the proposed method is that the four states of the hidden Markov process have well-defined biological meanings and allow us to directly call the change patterns based on the corresponding posterior probabilities. In contrast, none of existing methods can offer this advantage. In addition, the proposed method offers great power in statistical inference by spatial smoothing (via hidden Markov modeling) and information pooling (via hierarchical modeling). Both simulation studies and real data analysis in a cocaine addiction study illustrate the reliability and success of this method.
表观遗传学是研究基因组的变化,这些变化可以开启或关闭基因,并决定哪些蛋白质被转录,而不改变 DNA 序列。最近,表观遗传变化与精神障碍等疾病的发展和进展有关。高通量表观遗传实验使研究人员能够测量全基因组的表观遗传谱,并产生由免疫沉淀与参考样本的强度比组成的数据。强度比可以提供一个在一个实验条件下蛋白质结合发生的基因组区域的视图,并进一步允许我们通过比较两种不同条件来检测表观遗传改变。然而,这样的实验可能很昂贵,只有少数几个复制品可用。此外,表观遗传数据通常与高噪声水平具有空间相关性。在本文中,我们开发了一种贝叶斯层次模型,结合具有四个状态的隐马尔可夫过程进行建模,以从具有配对内部对照的两样本实验中检测具有表观遗传变化的基因组位点。所提出方法的一个吸引人的特点是,隐马尔可夫过程的四个状态具有明确的生物学意义,并允许我们根据相应的后验概率直接调用变化模式。相比之下,现有的方法都无法提供这一优势。此外,该方法通过空间平滑(通过隐马尔可夫建模)和信息池化(通过层次建模)提供了强大的统计推断能力。可卡因成瘾研究中的模拟研究和真实数据分析都说明了该方法的可靠性和成功。