Suppr超能文献

RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。

RECAP reveals the true statistical significance of ChIP-seq peak calls.

机构信息

Translational and Molecular Medicine Program, University of Ottawa, Ottawa, ON K1H8M5, Canada.

Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H8L6, Canada.

出版信息

Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.

Abstract

MOTIVATION

Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice-once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown.

RESULTS

Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls.

AVAILABILITY AND IMPLEMENTATION

The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

染色质免疫沉淀(ChIP)-seq 被广泛用于识别转录因子结合的位点或基因组中表观遗传修饰的区域。ChIP-seq 分析的关键步骤是峰调用,即识别 ChIP 相对于对照读数富集的基因组区域。已经设计了许多程序来解决这个任务,但几乎所有程序都陷入了统计陷阱,即两次使用数据 - 一次用于确定候选富集区域,另一次用于通过经典统计假设检验评估富集。这种对数据的双重使用会使分配给富集区域的统计显着性无效,因此,峰调用的真实显着性或可靠性仍然未知。

结果

使用模拟和真实的 ChIP-seq 数据,我们表明,三个著名的峰调用者,MACS、SICER 和 diffReps,输出偏向的 P 值和错误发现率估计值可能过于乐观了几个数量级。我们提出了一个封装算法 RECAP,该算法使用 ChIP-seq 和对照数据的重采样来估计单调变换,以纠正峰调用算法中内置的偏差。当应用于没有 ChIP-seq 和对照之间富集的零假设数据时,RECAP 重新校准的 P 值近似均匀分布。在存在真正富集的情况下,RECAP P 值可以更好地估计候选峰的真实统计显着性,并提供更好的错误发现率估计值,这些估计值与经验再现性更好地相关。RECAP 是评估 ChIP-seq 峰调用真实统计显着性的强大新工具。

可用性和实现

RECAP 软件可通过 www.perkinslab.ca 或在 github 上的 https://github.com/theodorejperkins/RECAP 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a05/6761936/2116b9fa1291/btz150f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验