因未注释的高拷贝数区域导致 ChIP-seq 和其他基于测序的功能测定出现假阳性峰。

False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.

机构信息

Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.

出版信息

Bioinformatics. 2011 Aug 1;27(15):2144-6. doi: 10.1093/bioinformatics/btr354. Epub 2011 Jun 19.

DOI:10.1093/bioinformatics/btr354

PMID:21690102

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3137225/

Abstract

MOTIVATION

Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy.

RESULTS

Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls.

AVAILABILITY

Files for masking out these regions are available at eqtl.uchicago.edu

摘要

动机

基于测序的测定方法，如 ChIP-seq、DNase-seq 和 MNase-seq，已成为基因组注释的重要工具。在这些测定方法中，富集感兴趣基因座的短序列读取被映射到参考基因组上，以确定其来源。在这里，我们考虑参考基因组中的特定类型错误是否会导致假阳性峰调用：多拷贝序列被错误组装并折叠成单个拷贝。

结果

我们使用来自 1000 基因组计划的测序数据，系统地扫描人类基因组中测序深度较高的区域。这些区域高度富含错误推断的转录因子结合位点、核小体位置和开放染色质区域。我们建议使用一种简单的屏蔽程序来删除这些区域并减少假阳性调用。

可用性

可在 eqtl.uchicago.edu 获得用于屏蔽这些区域的文件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c89/3137225/bc15290efb94/btr354f1.jpg

相似文献

False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.因未注释的高拷贝数区域导致 ChIP-seq 和其他基于测序的功能测定出现假阳性峰。

Bioinformatics. 2011 Aug 1;27(15):2144-6. doi: 10.1093/bioinformatics/btr354. Epub 2011 Jun 19.

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.利用 ChIP-Seq 数据的多读分析技术，在基因组的高度重复区域中发现转录因子结合位点。

PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.

Using combined evidence from replicates to evaluate ChIP-seq peaks.使用来自重复样本的综合证据评估染色质免疫沉淀测序（ChIP-seq）峰。

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

AREM: aligning short reads from ChIP-sequencing by expectation maximization.AREM：通过期望最大化算法对ChIP测序的短读段进行比对

J Comput Biol. 2011 Nov;18(11):1495-505. doi: 10.1089/cmb.2011.0185. Epub 2011 Oct 28.

CNV-guided multi-read allocation for ChIP-seq.基于 CNV 的 ChIP-seq 多读取分配

Bioinformatics. 2014 Oct 15;30(20):2860-7. doi: 10.1093/bioinformatics/btu402. Epub 2014 Jun 24.

The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding.三态算法：提高 ChIP-Seq 峰发现的灵敏度和特异性。

BMC Bioinformatics. 2012 Jul 24;13:176. doi: 10.1186/1471-2105-13-176.

Characterising ChIP-seq binding patterns by model-based peak shape deconvolution.基于模型的峰形反卷积分析 ChIP-seq 结合模式。

BMC Genomics. 2013 Nov 26;14(1):834. doi: 10.1186/1471-2164-14-834.

Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks.控制ChIP-Seq峰中假阳性并估计置信度的经验方法。

BMC Bioinformatics. 2008 Dec 5;9:523. doi: 10.1186/1471-2105-9-523.

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.一种用于多个ChIP-seq数据集定量比较的新型统计方法。

Bioinformatics. 2015 Jun 15;31(12):1889-96. doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.

BayesPeak: Bayesian analysis of ChIP-seq data.BayesPeak：用于 ChIP-seq 数据的贝叶斯分析。

BMC Bioinformatics. 2009 Sep 21;10:299. doi: 10.1186/1471-2105-10-299.

引用本文的文献

Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.基准测试转录因子结合位点预测模型：对合成数据和生物数据的比较分析

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.

A large-scale benchmark for network inference from single-cell perturbation data.一个用于从单细胞扰动数据进行网络推断的大规模基准。

Commun Biol. 2025 Mar 11;8(1):412. doi: 10.1038/s42003-025-07764-y.

ChIP-Based Nuclear DNA Isolation for Genome Sequencing in to Remove Cytosol and Bacterial DNA Contamination.基于染色质免疫沉淀的核DNA分离用于基因组测序，以去除胞质溶胶和细菌DNA污染。

Plants (Basel). 2023 May 5;12(9):1883. doi: 10.3390/plants12091883.

Comprehensive Survey of ChIP-Seq Datasets to Identify Candidate Iron Homeostasis Genes Regulated by Chromatin Modifications.全面综述 ChIP-seq 数据集，以鉴定受染色质修饰调控的铁稳态候选基因。

Methods Mol Biol. 2023;2665:95-111. doi: 10.1007/978-1-0716-3183-6_9.

A survey on algorithms to characterize transcription factor binding sites.一种用于刻画转录因子结合位点的算法研究综述。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.

excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies.排除区：T2T-CHM13、GRCm39 和其他基因组组装的排除集。

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad198.

Quality control and evaluation of plant epigenomics data.植物表观基因组学数据的质量控制与评估。

Plant Cell. 2022 Jan 20;34(1):503-513. doi: 10.1093/plcell/koab255.

Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis.用于ChIP-Seq分析的常用峰检测程序的比较分析。

Genomics Inform. 2020 Dec;18(4):e42. doi: 10.5808/GI.2020.18.4.e42. Epub 2020 Dec 14.

Hexavalent chromium promotes differential binding of CTCF to its cognate sites in Euchromatin.六价铬促进 CTCF 在常染色质中与其同源结合位点的差异结合。

Epigenetics. 2021 Dec;16(12):1361-1376. doi: 10.1080/15592294.2020.1864168. Epub 2021 Jan 7.

Cross-species regulatory sequence activity prediction.跨物种调控序列活性预测。

PLoS Comput Biol. 2020 Jul 20;16(7):e1008050. doi: 10.1371/journal.pcbi.1008050. eCollection 2020 Jul.

本文引用的文献

Comprehensive analysis of the chromatin landscape in Drosophila melanogaster.全面分析黑腹果蝇的染色质景观。

Nature. 2011 Mar 24;471(7339):480-5. doi: 10.1038/nature09725. Epub 2010 Dec 22.

Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data.从 DNA 序列和染色质可及性数据中准确推断转录因子结合。

Genome Res. 2011 Mar;21(3):447-55. doi: 10.1101/gr.112623.110. Epub 2010 Nov 24.

The uniqueome: a mappability resource for short-tag sequencing.独特组学：短标签测序的可作图资源。

Bioinformatics. 2011 Jan 15;27(2):272-4. doi: 10.1093/bioinformatics/btq640. Epub 2010 Nov 12.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Origins and functional impact of copy number variation in the human genome.人类基因组中拷贝数变异的起源和功能影响。

Nature. 2010 Apr 1;464(7289):704-12. doi: 10.1038/nature08516. Epub 2009 Oct 7.

Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries.基于测序的染色质免疫沉淀对照文库中的固有信号。

PLoS One. 2009;4(4):e5241. doi: 10.1371/journal.pone.0005241. Epub 2009 Apr 15.

Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.通过数字基因组足迹法对体内蛋白质-DNA相互作用进行全球图谱绘制。

Nat Methods. 2009 Apr;6(4):283-9. doi: 10.1038/nmeth.1313. Epub 2009 Mar 22.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.PeakSeq能够相对于对照对ChIP-seq实验进行系统评分。

Nat Biotechnol. 2009 Jan;27(1):66-75. doi: 10.1038/nbt.1518. Epub 2009 Jan 4.

Model-based analysis of ChIP-Seq (MACS).基于模型的染色质免疫沉淀测序分析（MACS）

Genome Biol. 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137. Epub 2008 Sep 17.

Combinatorial patterns of histone acetylations and methylations in the human genome.人类基因组中组蛋白乙酰化和甲基化的组合模式。

Nat Genet. 2008 Jul;40(7):897-903. doi: 10.1038/ng.154. Epub 2008 Jun 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

因未注释的高拷贝数区域导致 ChIP-seq 和其他基于测序的功能测定出现假阳性峰。

False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献