Suppr超能文献

控制ChIP-Seq峰中假阳性并估计置信度的经验方法。

Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks.

作者信息

Nix David A, Courdy Samir J, Boucher Kenneth M

机构信息

Huntsman Cancer Institute, Department of Research Informatics, University of Utah, Salt Lake City, Utah, 84105, USA.

出版信息

BMC Bioinformatics. 2008 Dec 5;9:523. doi: 10.1186/1471-2105-9-523.

Abstract

BACKGROUND

High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation.

RESULTS

Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2-3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds.

CONCLUSION

The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/.

摘要

背景

高通量特征测序前景广阔,其中之一是能够快速识别体内转录因子结合位点、组蛋白修饰、染色质结构变化以及全基因组范围内的DNA甲基化模式。在这些实验中,染色质免疫沉淀用于富集特定的目标DNA序列,特征测序用于将这些区域定位到基因组(ChIP-Seq)。事实证明,阐明这些DNA-蛋白质结合/修饰位点有助于重建基因调控和染色质重塑网络,这些网络指导着发育、细胞对扰动的反应以及肿瘤转化。

结果

在此,我们展示了一套算法和软件包,该软件包利用对照输入数据来减少假阳性并估计ChIP-Seq峰的可信度。使用两个模拟的掺入数据集对几种不同的方法进行了比较。发现在5%的错误发现率(FDR)下,使用对照输入数据和归一化差异评分可使ChIP-Seq峰的回收率提高一倍以上。此外,发现二项式p值/q值和经验性FDR都能在2至3倍的范围内预测真实的FDR,并且比全局泊松p值更可靠地估计可信度。然后,这些方法被用于重新分析Johnson等人的神经元限制性沉默因子(NRSF)ChIP-Seq数据,而无需依赖广泛的qPCR验证的NRSF位点以及NRSF结合基序的存在来设置阈值。

结论

此处开发和测试的方法在减少假阳性以及在无需任何ChIP靶标的先验知识的情况下估计ChIP-Seq数据的可信度方面显示出巨大的前景。它们是一个更大的开源软件包的一部分,可从http://useq.sourceforge.net/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b6f/2628906/3bac4385cacf/1471-2105-9-523-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验