Allhoff Manuel, Seré Kristin, F Pires Juliana, Zenke Martin, G Costa Ivan
IZKF Bioinformatics Research Group, RWTH Aachen University Medical School, Pauwelsstr. 19, 52074 Aachen, Germany.
Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen University, Schinkelstr. 2, 52062 Aachen, Germany.
Nucleic Acids Res. 2016 Nov 16;44(20):e153. doi: 10.1093/nar/gkw680. Epub 2016 Aug 2.
The study of changes in protein-DNA interactions measured by ChIP-seq on dynamic systems, such as cell differentiation, response to treatments or the comparison of healthy and diseased individuals, is still an open challenge. There are few computational methods comparing changes in ChIP-seq signals with replicates. Moreover, none of these previous approaches addresses ChIP-seq specific experimental artefacts arising from studies with biological replicates. We propose THOR, a Hidden Markov Model based approach, to detect differential peaks between pairs of biological conditions with replicates. THOR provides all pre- and post-processing steps required in ChIP-seq analyses. Moreover, we propose a novel normalization approach based on housekeeping genes to deal with cases where replicates have distinct signal-to-noise ratios. To evaluate differential peak calling methods, we delineate a methodology using both biological and simulated data. This includes an evaluation procedure that associates differential peaks with changes in gene expression as well as histone modifications close to these peaks. We evaluate THOR and seven competing methods on data sets with distinct characteristics from in vitro studies with technical replicates to clinical studies of cancer patients. Our evaluation analysis comprises of 13 comparisons between pairs of biological conditions. We show that THOR performs best in all scenarios.
通过ChIP-seq对动态系统(如细胞分化、对治疗的反应或健康个体与患病个体的比较)中蛋白质-DNA相互作用变化的研究仍然是一个尚未解决的挑战。很少有计算方法能将ChIP-seq信号的变化与重复样本进行比较。此外,这些先前的方法都没有解决因生物重复研究而产生的ChIP-seq特定实验假象。我们提出了THOR,一种基于隐马尔可夫模型的方法,用于检测具有重复样本的生物条件对之间的差异峰。THOR提供了ChIP-seq分析所需的所有预处理和后处理步骤。此外,我们提出了一种基于管家基因的新型归一化方法,以处理重复样本具有不同信噪比的情况。为了评估差异峰检测方法,我们描述了一种使用生物数据和模拟数据的方法。这包括一个将差异峰与基因表达变化以及这些峰附近的组蛋白修饰相关联的评估程序。我们在具有不同特征的数据集上评估了THOR和七种竞争方法,这些数据集涵盖了从具有技术重复的体外研究到癌症患者临床研究。我们的评估分析包括13对生物条件之间的比较。我们表明,THOR在所有情况下表现最佳。