ChIP-seq的标准化、偏差校正和峰检测

Normalization, bias correction, and peak calling for ChIP-seq.

作者信息

Diaz Aaron, Park Kiyoub, Lim Daniel A, Song Jun S

机构信息

University of California, San Francisco, USA.

出版信息

Stat Appl Genet Mol Biol. 2012 Mar 31;11(3):Article 9. doi: 10.1515/1544-6115.1750.

DOI:10.1515/1544-6115.1750

PMID:22499706

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3342857/

Abstract

Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods.

摘要

新一代测序技术正在迅速改变我们描绘细胞转录、遗传和表观遗传状态的能力。特别是，对蛋白质-DNA复合物免疫沉淀（ChIP-seq）和甲基化DNA（MeDIP-seq）的DNA进行测序，可以揭示蛋白质结合位点和表观遗传修饰的位置。这些方法存在许多偏差，可能会显著影响对所得数据的解释。目前仍缺乏用于检测和消除此类偏差的严格计算方法。此外，多样本归一化仍然是一个重要的开放性问题。这篇理论论文通过使用严格的统计模型和信号处理技术，比较62个单独的公开可用数据集，系统地描述了ChIP-seq数据的偏差和特性。文中提出了将ChIP-seq信号与背景噪声分离的统计方法，以及针对序列依赖性和超声处理偏差校正富集测试统计量的方法。我们的方法在归一化之前有效地将 reads 分离为信号和背景成分，提高了信噪比。此外，目前大多数峰检测工具使用的是通用的空模型，在检测细微但真实的ChIP富集所需的灵敏度水平下，其特异性较低。所提出的确定细胞类型特异性空模型的方法，该方法考虑了细胞类型特异性偏差，结果表明在给定的显著性阈值下，与当前方法相比能够实现更低的错误发现率。

相似文献

Normalization, bias correction, and peak calling for ChIP-seq.

Stat Appl Genet Mol Biol. 2012 Mar 31;11(3):Article 9. doi: 10.1515/1544-6115.1750.

Is this the right normalization? A diagnostic tool for ChIP-seq normalization.

BMC Bioinformatics. 2015 May 9;16:150. doi: 10.1186/s12859-015-0579-z.

RECAP reveals the true statistical significance of ChIP-seq peak calls.

Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.

WACS: improving ChIP-seq peak calling by optimally weighting controls.

BMC Bioinformatics. 2021 Feb 15;22(1):69. doi: 10.1186/s12859-020-03927-2.

A comparison of peak callers used for DNase-Seq data.

PLoS One. 2014 May 8;9(5):e96303. doi: 10.1371/journal.pone.0096303. eCollection 2014.

Chromatin Immunoprecipitation and High-Throughput Sequencing (ChIP-Seq): Tips and Tricks Regarding the Laboratory Protocol and Initial Downstream Data Analysis.

Methods Mol Biol. 2018;1767:271-288. doi: 10.1007/978-1-4939-7774-1_15.

Systematic bias in high-throughput sequencing data and its correction by BEADS.

Nucleic Acids Res. 2011 Aug;39(15):e103. doi: 10.1093/nar/gkr425. Epub 2011 Jun 6.

Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile.

Bioinformatics. 2018 Jul 15;34(14):2356-2363. doi: 10.1093/bioinformatics/bty137.

ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis.

BMC Genomics. 2011 Feb 28;12:134. doi: 10.1186/1471-2164-12-134.

Ritornello: high fidelity control-free chromatin immunoprecipitation peak calling.

Nucleic Acids Res. 2017 Dec 1;45(21):e173. doi: 10.1093/nar/gkx799.

引用本文的文献

Benchmark of chromatin-protein interaction methods in haploid round spermatids.

Front Cell Dev Biol. 2025 May 13;13:1572405. doi: 10.3389/fcell.2025.1572405. eCollection 2025.

Annotation of cis-regulatory-associated histone modifications in the genomes of two Thoroughbred stallions.

Front Genet. 2025 Feb 27;16:1534461. doi: 10.3389/fgene.2025.1534461. eCollection 2025.

Improved cohesin HiChIP protocol and bioinformatic analysis for robust detection of chromatin loops and stripes.

Commun Biol. 2025 Mar 14;8(1):437. doi: 10.1038/s42003-025-07847-w.

Evaluating the Performance of Peak Calling Algorithms Available for Intracellular G-Quadruplex Sequencing.

Int J Mol Sci. 2025 Jan 31;26(3):1268. doi: 10.3390/ijms26031268.

Genomic 8-oxoguanine modulates gene transcription independent of its repair by DNA glycosylases OGG1 and MUTYH.

Redox Biol. 2025 Feb;79:103461. doi: 10.1016/j.redox.2024.103461. Epub 2024 Dec 5.

Targeting ATP2B1 impairs PI3K/Akt/FOXO signaling and reduces SARS-COV-2 infection and replication.

EMBO Rep. 2024 Jul;25(7):2974-3007. doi: 10.1038/s44319-024-00164-z. Epub 2024 May 30.

MUFFIN: a suite of tools for the analysis of functional sequencing data.

NAR Genom Bioinform. 2024 May 14;6(2):lqae051. doi: 10.1093/nargab/lqae051. eCollection 2024 Jun.

EpiSegMix: a flexible distribution hidden Markov model with duration modeling for chromatin state discovery.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae178.

RLSuite: An Integrative R-Loop Bioinformatics Framework.

J Bioinform Syst Biol. 2023;6(4):364-378. doi: 10.26502/jbsb.5107071. Epub 2023 Dec 21.

Chromatin remodeler CHD8 is required for spermatogonial proliferation and early meiotic progression.

Nucleic Acids Res. 2024 Apr 12;52(6):2995-3010. doi: 10.1093/nar/gkad1256.

本文引用的文献

A Statistical Framework for the Analysis of ChIP-Seq Data.

J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.

Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.

Differential expression analysis for sequence count data.

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

Modeling non-uniformity in short-read rates in RNA-Seq data.

Genome Biol. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. Epub 2010 May 11.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.

c-Myc regulates transcriptional pause release.

Cell. 2010 Apr 30;141(3):432-45. doi: 10.1016/j.cell.2010.03.030.

Biases in Illumina transcriptome sequencing caused by random hexamer priming.

Nucleic Acids Res. 2010 Jul;38(12):e131. doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.

A signal-noise model for significance analysis of ChIP-seq with negative control.

Bioinformatics. 2010 May 1;26(9):1199-204. doi: 10.1093/bioinformatics/btq128. Epub 2010 Apr 5.

BayesPeak: Bayesian analysis of ChIP-seq data.

BMC Bioinformatics. 2009 Sep 21;10:299. doi: 10.1186/1471-2105-10-299.

Impact of chromatin structures on DNA processing for genomic analyses.

PLoS One. 2009 Aug 20;4(8):e6700. doi: 10.1371/journal.pone.0006700.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChIP-seq的标准化、偏差校正和峰检测

Normalization, bias correction, and peak calling for ChIP-seq.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献