Bansal Mukesh, Mendiratta Geetu, Anand Santosh, Kushwaha Ritu, Kim Ryan, Kustagi Manju, Iyer Archana, Chaganti Raju S K, Califano Andrea, Sumazin Pavel
BMC Genomics. 2015;16 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-16-S5-S4. Epub 2015 May 26.
Chromatin immunoprecipitation followed by sequencing of protein-bound DNA fragments (ChIP-Seq) is an effective high-throughput methodology for the identification of context specific DNA fragments that are bound by specific proteins in vivo. Despite significant progress in the bioinformatics analysis of this genome-scale data, a number of challenges remain as technology-dependent biases, including variable target accessibility and mappability, sequence-dependent variability, and non-specific binding affinity must be accounted for.
We introduce a nonparametric method for scoring consensus regions of aligned immunoprecipitated DNA fragments when appropriate control experiments are available. Our method uses local models for null binding; these are necessary because binding prediction scores based on global models alone fail to properly account for specialized features of genomic regions and chance pull downs of specific DNA fragments, thus disproportionally rewarding some genomic regions and decreasing prediction accuracy. We make no assumptions about the structure or amplitude of bound peaks, yet we show that our method outperforms leading methods developed using either global or local null hypothesis models for random binding. We test prediction performance by comparing analyses of ChIP-seq, ChIP-chip, motif-based binding-site prediction, and shRNA assays, showing high reproducibility, binding-site enrichment in predicted target regions, and functional regulation of predicted targets.
Given appropriate controls, a direct nonparametric method for identifying transcription-factor targets from ChIP-Seq assays may lead to both higher sensitivity and higher specificity, and should be preferred or used in conjunction with methods that use parametric models for null binding.
蛋白质结合DNA片段测序的染色质免疫沉淀技术(ChIP-Seq)是一种有效的高通量方法,用于鉴定体内与特定蛋白质结合的上下文特异性DNA片段。尽管在这种基因组规模数据的生物信息学分析方面取得了重大进展,但仍存在一些挑战,因为技术依赖性偏差,包括可变的靶标可及性和可映射性、序列依赖性变异性以及非特异性结合亲和力,都必须加以考虑。
当有适当的对照实验时,我们引入了一种非参数方法来对免疫沉淀DNA片段比对的共有区域进行评分。我们的方法使用局部模型来表示无效结合;这是必要的,因为仅基于全局模型的结合预测分数无法正确考虑基因组区域的特殊特征以及特定DNA片段的偶然下拉,从而不成比例地奖励了一些基因组区域并降低了预测准确性。我们对结合峰的结构或幅度不做任何假设,但我们表明,我们的方法优于使用全局或局部零假设模型进行随机结合开发的领先方法。我们通过比较ChIP-seq、ChIP-chip、基于基序的结合位点预测和shRNA分析的结果来测试预测性能,结果显示出高重现性、预测靶区域中的结合位点富集以及预测靶标的功能调控。
在有适当对照的情况下,一种从ChIP-Seq分析中识别转录因子靶标的直接非参数方法可能会带来更高的灵敏度和更高的特异性,并且应该优先使用或与使用参数模型进行无效结合的方法结合使用。