Xu Zheng, Zhang Guosheng, Jin Fulai, Chen Mengjie, Furey Terrence S, Sullivan Patrick F, Qin Zhaohui, Hu Ming, Li Yun
Department of Biostatistics, Department of Genetics, Department of Computer Science.
Department of Computer Science, Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599, USA.
Bioinformatics. 2016 Mar 1;32(5):650-6. doi: 10.1093/bioinformatics/btv650. Epub 2015 Nov 4.
Advances in chromosome conformation capture and next-generation sequencing technologies are enabling genome-wide investigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. One essential task in such studies is peak calling, that is, detecting non-random interactions between loci from the two-dimensional contact frequency matrix. Successful fulfillment of this task has many important implications including identifying long-range interactions that assist interpreting a sizable fraction of the results from genome-wide association studies. The task - distinguishing biologically meaningful chromatin interactions from massive numbers of random interactions - poses great challenges both statistically and computationally. Model-based methods to address this challenge are still lacking. In particular, no statistical model exists that takes the underlying dependency structure into consideration.
In this paper, we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis.
The Source codes can be downloaded at: http://www.unc.edu/∼yunmli/HMRFBayesHiC CONTACT: ming.hu@nyumc.org or yunli@med.unc.edu
Supplementary data are available at Bioinformatics online.
染色体构象捕获技术和新一代测序技术的进步使得对动态染色质相互作用进行全基因组研究成为可能。例如,Hi-C实验通过对在空间上紧密相邻的位点连接的DNA片段进行测序,生成全基因组范围内位点对之间的接触频率。此类研究中的一项重要任务是峰检测,即从二维接触频率矩阵中检测位点之间的非随机相互作用。成功完成这项任务具有许多重要意义,包括识别有助于解释全基因组关联研究中相当一部分结果的长程相互作用。将生物学上有意义的染色质相互作用与大量随机相互作用区分开来的任务在统计和计算方面都带来了巨大挑战。目前仍缺乏基于模型的方法来应对这一挑战。特别是,不存在考虑潜在依赖结构的统计模型。
在本文中,我们提出了一种基于隐马尔可夫随机场(HMRF)的贝叶斯方法,以基于接触频率矩阵在二维空间中严格建模相互作用概率。通过借鉴相邻位点对的信息,我们的方法在模拟研究和实际数据分析中均表现出卓越的可重复性和统计能力。
源代码可从以下网址下载:http://www.unc.edu/∼yunmli/HMRFBayesHiC
ming.hu@nyumc.org或yunli@med.unc.edu
补充数据可在《生物信息学》在线获取。