RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。

RECAP reveals the true statistical significance of ChIP-seq peak calls.

机构信息

Translational and Molecular Medicine Program, University of Ottawa, Ottawa, ON K1H8M5, Canada.

Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, ON K1H8L6, Canada.

出版信息

Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.

DOI:10.1093/bioinformatics/btz150

PMID:30824903

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6761936/

Abstract

MOTIVATION

Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice-once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown.

RESULTS

Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls.

AVAILABILITY AND IMPLEMENTATION

The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

染色质免疫沉淀（ChIP）-seq 被广泛用于识别转录因子结合的位点或基因组中表观遗传修饰的区域。ChIP-seq 分析的关键步骤是峰调用，即识别 ChIP 相对于对照读数富集的基因组区域。已经设计了许多程序来解决这个任务，但几乎所有程序都陷入了统计陷阱，即两次使用数据 - 一次用于确定候选富集区域，另一次用于通过经典统计假设检验评估富集。这种对数据的双重使用会使分配给富集区域的统计显着性无效，因此，峰调用的真实显着性或可靠性仍然未知。

结果

使用模拟和真实的 ChIP-seq 数据，我们表明，三个著名的峰调用者，MACS、SICER 和 diffReps，输出偏向的 P 值和错误发现率估计值可能过于乐观了几个数量级。我们提出了一个封装算法 RECAP，该算法使用 ChIP-seq 和对照数据的重采样来估计单调变换，以纠正峰调用算法中内置的偏差。当应用于没有 ChIP-seq 和对照之间富集的零假设数据时，RECAP 重新校准的 P 值近似均匀分布。在存在真正富集的情况下，RECAP P 值可以更好地估计候选峰的真实统计显着性，并提供更好的错误发现率估计值，这些估计值与经验再现性更好地相关。RECAP 是评估 ChIP-seq 峰调用真实统计显着性的强大新工具。

可用性和实现

RECAP 软件可通过 www.perkinslab.ca 或在 github 上的 https://github.com/theodorejperkins/RECAP 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a05/6761936/2116b9fa1291/btz150f1.jpg

相似文献

RECAP reveals the true statistical significance of ChIP-seq peak calls.RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。

Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.

Using combined evidence from replicates to evaluate ChIP-seq peaks.使用来自重复样本的综合证据评估染色质免疫沉淀测序（ChIP-seq）峰。

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

NoPeak: k-mer-based motif discovery in ChIP-Seq data without peak calling.NoPeak：无峰调用的 ChIP-Seq 数据中的基于 k-mer 的基序发现。

Bioinformatics. 2021 May 5;37(5):596-602. doi: 10.1093/bioinformatics/btaa845.

WACS: improving ChIP-seq peak calling by optimally weighting controls.WACS：通过最优加权对照来提高 ChIP-seq 峰调用。

BMC Bioinformatics. 2021 Feb 15;22(1):69. doi: 10.1186/s12859-020-03927-2.

epic2 efficiently finds diffuse domains in ChIP-seq data.epic2 能够有效地在 ChIP-seq 数据中找到弥散域。

Bioinformatics. 2019 Nov 1;35(21):4392-4393. doi: 10.1093/bioinformatics/btz232.

Characterising ChIP-seq binding patterns by model-based peak shape deconvolution.基于模型的峰形反卷积分析 ChIP-seq 结合模式。

BMC Genomics. 2013 Nov 26;14(1):834. doi: 10.1186/1471-2164-14-834.

DiffChIPL: a differential peak analysis method for high-throughput sequencing data with biological replicates based on limma.DiffChIPL：一种基于 limma 的具有生物学重复的高通量测序数据差异峰分析方法。

Bioinformatics. 2022 Sep 2;38(17):4062-4069. doi: 10.1093/bioinformatics/btac498.

Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile.使用链位移谱图对 ChIP-seq 读取分布进行灵敏且稳健的评估。

Bioinformatics. 2018 Jul 15;34(14):2356-2363. doi: 10.1093/bioinformatics/bty137.

Is this the right normalization? A diagnostic tool for ChIP-seq normalization.这是正确的标准化方法吗？一种用于ChIP-seq标准化的诊断工具。

BMC Bioinformatics. 2015 May 9;16:150. doi: 10.1186/s12859-015-0579-z.

Enricherator: A Bayesian Method for Inferring Regularized Genome-wide Enrichments from Sequencing Count Data.富集分析工具：一种从测序计数数据中推断正则化全基因组富集的贝叶斯方法。

J Mol Biol. 2024 Sep 1;436(17):168567. doi: 10.1016/j.jmb.2024.168567. Epub 2024 Apr 5.

引用本文的文献

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.分析生物标志物发现：估计生物标志物集的可重复性。

PLoS One. 2022 Jul 28;17(7):e0252697. doi: 10.1371/journal.pone.0252697. eCollection 2022.

Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders.使用无监督多视图自动编码器检测基因组目录中的异常。

BMC Bioinformatics. 2021 Sep 25;22(1):460. doi: 10.1186/s12859-021-04359-2.

Molecular and computational approaches to map regulatory elements in 3D chromatin structure.分子和计算方法在 3D 染色质结构中绘制调控元件。

Epigenetics Chromatin. 2021 Mar 19;14(1):14. doi: 10.1186/s13072-021-00390-y.

A deep learning framework combined with word embedding to identify DNA replication origins.深度学习框架结合词嵌入技术识别 DNA 复制起点

Sci Rep. 2021 Jan 12;11(1):844. doi: 10.1038/s41598-020-80670-x.

A physical basis for quantitative ChIP-sequencing.一种定量 ChIP-seq 的物理基础。

J Biol Chem. 2020 Nov 20;295(47):15826-15837. doi: 10.1074/jbc.RA120.015353. Epub 2020 Sep 29.

Chromatin changes in Anopheles gambiae induced by Plasmodium falciparum infection.疟原虫感染诱导的冈比亚按蚊染色质变化。

Epigenetics Chromatin. 2019 Jan 7;12(1):5. doi: 10.1186/s13072-018-0250-9.

本文引用的文献

AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification.AIControl：用机器学习替代匹配对照实验可提高 ChIP-seq 峰识别。

Nucleic Acids Res. 2019 Jun 4;47(10):e58. doi: 10.1093/nar/gkz156.

The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery.国际人类表观基因组联盟：科学合作与发现的蓝图。

Cell. 2016 Nov 17;167(5):1145-1149. doi: 10.1016/j.cell.2016.11.007.

BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates.BIDCHIPS：从ChIP-seq数据中进行偏差分解和去除，可阐明真实的结合信号及其功能相关性。

Epigenetics Chromatin. 2015 Sep 17;8:33. doi: 10.1186/s13072-015-0028-2. eCollection 2015.

Integrative analysis of 111 reference human epigenomes.111 个人类参考基因组的综合分析。

Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248.

Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape.对公开的染色质免疫沉淀测序（ChIP-seq）实验进行综合分析，揭示了一个复杂的多细胞调控格局。

Nucleic Acids Res. 2015 Feb 27;43(4):e27. doi: 10.1093/nar/gku1280. Epub 2014 Dec 3.

Principles of regulatory information conservation between mouse and human.小鼠与人类之间调控信息保守的原则。

Nature. 2014 Nov 20;515(7527):371-375. doi: 10.1038/nature13985.

De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly.利用峰和窗口对ChIP-seq数据进行差异结合区域的从头检测：正确控制错误率。

Nucleic Acids Res. 2014 Jun;42(11):e95. doi: 10.1093/nar/gku351. Epub 2014 May 22.

Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells.用于识别芯片富集区域（SICER）的空间聚类，以绘制胚胎干细胞中组蛋白甲基化模式的区域。

Methods Mol Biol. 2014;1150:97-111. doi: 10.1007/978-1-4939-0512-6_5.

Epigenetics, chromatin and genome organization: recent advances from the ENCODE project.表观遗传学、染色质和基因组组织：ENCODE 项目的最新进展。

J Intern Med. 2014 Sep;276(3):201-14. doi: 10.1111/joim.12231. Epub 2014 Mar 27.

Adaptive bandwidth kernel density estimation for next-generation sequencing data.用于下一代测序数据的自适应带宽核密度估计

BMC Proc. 2013 Dec 20;7(Suppl 7):S7. doi: 10.1186/1753-6561-7-S7-S7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。

RECAP reveals the true statistical significance of ChIP-seq peak calls.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献