Suppr超能文献

OccuPeak:基于内部背景建模的ChIP-Seq峰检测

OccuPeak: ChIP-Seq peak calling based on internal background modelling.

作者信息

de Boer Bouke A, van Duijvenboden Karel, van den Boogaard Malou, Christoffels Vincent M, Barnett Phil, Ruijter Jan M

机构信息

Department of Anatomy, Embryology & Physiology, Academic Medical Centre, Amsterdam, The Netherlands.

出版信息

PLoS One. 2014 Jun 17;9(6):e99844. doi: 10.1371/journal.pone.0099844. eCollection 2014.

Abstract

UNLABELLED

ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly.

AVAILABILITY

http://occupeak.hfrc.nl.

摘要

未加标签

染色质免疫沉淀测序(ChIP-seq)已成为全基因组范围内鉴定转录因子结合位点或组蛋白修饰位点的主要工具。大多数峰值检测算法需要输入对照数据集来模拟背景读数的出现情况,以考虑局部测序和GC偏差。然而,输入序列(Input-seq)数据集中读数的GC含量与ChIP-seq数据集中的GC含量有显著差异。此外,我们观察到,当将模拟均匀背景集与Input-seq数据集进行比较时,一个常用的峰值检测程序表现同样良好。这与输入对照数据集对于准确反映背景读数分布是必要的这一假设相矛盾。由于ChIP-seq数据集中丰富的单读数的GC含量与随机采样区域的GC含量相似,我们设计了一种基于重叠单读数的背景模型的峰值检测算法。该应用程序OccuPeak使用每个ChIP-seq数据集中存在的丰富低频标签来模拟背景,从而无需额外的数据集。对OccuPeak性能的分析显示其模型参数稳健。其峰值显著性度量,即过量比率,仅取决于峰值的标签密度和全局噪声水平。与常用的峰值检测应用程序MACS和CisGenome相比,OccuPeak在增强子鉴定基准测试中具有最高的灵敏度,并且在转录因子占据与DNase I超敏位点和H3K27ac位点的重叠测试中表现相似。此外,OccuPeak检测到的峰值在与心脏病相关的单核苷酸多态性(SNP)中显著富集。OccuPeak作为一个独立应用程序运行,不需要对参数进行大量调整,使用起来简单且用户友好。

可用性

http://occupeak.hfrc.nl

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f926/4061025/30e23a6dba72/pone.0099844.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验