Zhang Yanxiao, Lin Yu-Hsuan, Johnson Timothy D, Rozek Laura S, Sartor Maureen A
Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, Department of Biostatistics and Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.
ChIP-Seq is the standard method to identify genome-wide DNA-binding sites for transcription factors (TFs) and histone modifications. There is a growing need to analyze experiments with biological replicates, especially for epigenomic experiments where variation among biological samples can be substantial. However, tools that can perform group comparisons are currently lacking.
We present a peak-calling prioritization pipeline (PePr) for identifying consistent or differential binding sites in ChIP-Seq experiments with biological replicates. PePr models read counts across the genome among biological samples with a negative binomial distribution and uses a local variance estimation method, ranking consistent or differential binding sites more favorably than sites with greater variability. We compared PePr with commonly used and recently proposed approaches on eight TF datasets and show that PePr uniquely identifies consistent regions with enriched read counts, high motif occurrence rate and known characteristics of TF binding based on visual inspection. For histone modification data with broadly enriched regions, PePr identified differential regions that are consistent within groups and outperformed other methods in scaling False Discovery Rate (FDR) analysis.
染色质免疫沉淀测序(ChIP-Seq)是识别全基因组范围内转录因子(TFs)的DNA结合位点和组蛋白修饰的标准方法。对于生物重复实验的分析需求日益增长,特别是对于生物样本间差异可能很大的表观基因组实验。然而,目前缺乏能够进行组间比较的工具。
我们提出了一种峰检测优先级流程(PePr),用于在具有生物重复的ChIP-Seq实验中识别一致或差异结合位点。PePr使用负二项分布对生物样本间全基因组的读数计数进行建模,并采用局部方差估计方法,相比于具有更大变异性的位点,更有利于对一致或差异结合位点进行排名。我们在八个TF数据集上,将PePr与常用和最近提出的方法进行了比较,结果表明,基于目视检查,PePr能独特地识别出具有富集读数计数、高基序出现率以及TF结合已知特征的一致区域。对于具有广泛富集区域的组蛋白修饰数据,PePr识别出组内一致的差异区域,并且在错误发现率(FDR)分析的规模上优于其他方法。