Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA.
Biostatistics. 2012 Jan;13(1):113-28. doi: 10.1093/biostatistics/kxr029. Epub 2011 Sep 13.
Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using 3 transcription factor (TF) ChIP-seq data sets and 2 mixed ChIP-seq data sets and compared with 4 popular and/or well-documented methods: MACS, CisGenome, BayesPeak, and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting TF binding sites with false discovery rate at a much lower level.
染色质免疫沉淀结合下一代测序(ChIP-seq)是一种强大的技术,正在广泛应用于各种生物学研究中,包括蛋白质-DNA 相互作用、DNA 甲基化和组蛋白修饰的全基因组测量。测序和/或基因组作图带来的大量数据和偏差带来了新的挑战,需要有效的方法和快速的计算机程序进行统计分析。为了系统地对 ChIP-seq 数据进行建模,我们为每个染色体构建一个动态信号轮廓,然后使用完全贝叶斯隐马尔可夫模型对该轮廓进行建模。所提出的模型自然考虑了序列标签的空间依赖性和全局和局部分布。它可用于单样本和双样本分析。通过模型诊断,该方法可以检测到由测序和/或映射错误引起的假富集区域,这通常是现有基于假设检验的方法所没有提供的。该方法使用 3 个转录因子(TF)ChIP-seq 数据集和 2 个混合 ChIP-seq 数据集进行了说明,并与 4 种流行的和/或有详细记录的方法:MACS、CisGenome、BayesPeak 和 SISSRs 进行了比较。结果表明,该方法在检测 TF 结合位点时具有相当或更高的灵敏度和空间分辨率,假发现率要低得多。