• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

考虑 GC 含量偏倚可减少 ChIP-seq 数据中的系统误差和批次效应。

Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.

机构信息

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.

出版信息

Genome Res. 2017 Nov;27(11):1930-1938. doi: 10.1101/gr.220673.117. Epub 2017 Oct 12.

DOI:10.1101/gr.220673.117
PMID:29025895
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5668949/
Abstract

The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.

摘要

ChIP-seq 技术的主要应用是检测与感兴趣的蛋白质结合的基因组区域。功能基因组学的公共目录很大一部分是基于 ChIP-seq 数据。这些目录依赖于峰调用算法,该算法通过检测与更多映射读取(覆盖范围)相关的基因组区域来推断蛋白质结合位点,这是由于实验方案缺乏完美的特异性。我们发现 GC 含量偏倚解释了 ChIP-seq 实验中观察到的覆盖范围的大量可变性,并且这种可变性导致了假阳性峰调用。更令人担忧的是,GC 效应在不同的实验中变化很大,其效应足够强,以至于当不同的实验室在同一细胞系上进行实验时,会导致大量的峰被不同地调用。然而,在 ChIP-seq 中考虑 GC 含量偏倚是具有挑战性的,因为感兴趣的结合位点往往在高 GC 含量区域更为常见,这使得真实的生物学信号与不必要的可变性混淆在一起。为了应对这一挑战,我们引入了一种统计方法,该方法考虑了 GC 对非特异性噪声和结合位点诱导的信号的影响。该方法可用于结合定量的这种偏差,也可用于改进现有的峰调用算法。我们使用这种方法来显示假阳性峰的减少以及实验室之间的一致性提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/9c31ce5d3d25/1930f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/c44a5a36b4ae/1930f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/a1bd9f0a8e62/1930f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/be2ee93e6e8d/1930f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/3145347e428d/1930f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/8d0b232a51a7/1930f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/9c31ce5d3d25/1930f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/c44a5a36b4ae/1930f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/a1bd9f0a8e62/1930f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/be2ee93e6e8d/1930f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/3145347e428d/1930f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/8d0b232a51a7/1930f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4637/5668949/9c31ce5d3d25/1930f06.jpg

相似文献

1
Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.考虑 GC 含量偏倚可减少 ChIP-seq 数据中的系统误差和批次效应。
Genome Res. 2017 Nov;27(11):1930-1938. doi: 10.1101/gr.220673.117. Epub 2017 Oct 12.
2
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment.基于拓扑基序富集改进ChIP-Seq数据中转录因子结合位点的分析。
BMC Genomics. 2014 Jun 13;15(1):472. doi: 10.1186/1471-2164-15-472.
3
OccuPeak: ChIP-Seq peak calling based on internal background modelling.OccuPeak:基于内部背景建模的ChIP-Seq峰检测
PLoS One. 2014 Jun 17;9(6):e99844. doi: 10.1371/journal.pone.0099844. eCollection 2014.
4
A generalized linear model for peak calling in ChIP-Seq data.用于ChIP-Seq数据峰检测的广义线性模型。
J Comput Biol. 2012 Jun;19(6):826-38. doi: 10.1089/cmb.2012.0023. Epub 2012 Apr 25.
5
Unified Analysis of Multiple ChIP-Seq Datasets.多个 ChIP-Seq 数据集的统一分析。
Methods Mol Biol. 2021;2198:451-465. doi: 10.1007/978-1-0716-0876-0_33.
6
Ritornello: high fidelity control-free chromatin immunoprecipitation peak calling.利托内洛:高保真无对照染色质免疫沉淀峰检测
Nucleic Acids Res. 2017 Dec 1;45(21):e173. doi: 10.1093/nar/gkx799.
7
Shape-based peak identification for ChIP-Seq.基于形状的 ChIP-Seq 峰识别。
BMC Bioinformatics. 2011 Jan 12;12:15. doi: 10.1186/1471-2105-12-15.
8
WACS: improving ChIP-seq peak calling by optimally weighting controls.WACS:通过最优加权对照来提高 ChIP-seq 峰调用。
BMC Bioinformatics. 2021 Feb 15;22(1):69. doi: 10.1186/s12859-020-03927-2.
9
Normalization, bias correction, and peak calling for ChIP-seq.ChIP-seq的标准化、偏差校正和峰检测
Stat Appl Genet Mol Biol. 2012 Mar 31;11(3):Article 9. doi: 10.1515/1544-6115.1750.
10
ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates.ChIP-R:从多个重复样本中组装可重复的ChIP-seq和ATAC-seq峰集。
Genomics. 2021 Jul;113(4):1855-1866. doi: 10.1016/j.ygeno.2021.04.026. Epub 2021 Apr 18.

引用本文的文献

1
A hierarchical, count-based model highlights challenges in scATAC-seq data analysis and points to opportunities to extract finer-resolution information.一种基于计数的分层模型突出了单细胞染色质可及性测序(scATAC-seq)数据分析中的挑战,并指出了提取更高分辨率信息的机会。
Genome Biol. 2025 Sep 17;26(1):282. doi: 10.1186/s13059-025-03735-y.
2
Selecting ChIP-seq normalization methods from the perspective of their technical conditions.从技术条件的角度选择ChIP-seq标准化方法。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf431.
3
Mapping-based genome size estimation.

本文引用的文献

1
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.RNA测序片段序列偏差的建模可减少转录本丰度估计中的系统误差。
Nat Biotechnol. 2016 Dec;34(12):1287-1291. doi: 10.1038/nbt.3682. Epub 2016 Sep 26.
2
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.JASPAR 2016:转录因子结合谱开放获取数据库的重大扩展与更新
Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.
3
Integrative analysis of 111 reference human epigenomes.
基于图谱的基因组大小估计
BMC Genomics. 2025 May 14;26(1):482. doi: 10.1186/s12864-025-11640-8.
4
ZNF143 is a transcriptional regulator of nuclear-encoded mitochondrial genes that acts independently of looping and CTCF.锌指蛋白143是核编码线粒体基因的转录调节因子,其作用独立于环化和CCCTC结合因子。
Mol Cell. 2025 Jan 2;85(1):24-41.e11. doi: 10.1016/j.molcel.2024.11.031. Epub 2024 Dec 20.
5
Improving rigor and reproducibility in chromatin immunoprecipitation assay data analysis workflows with Rocketchip.利用Rocketchip提高染色质免疫沉淀分析数据分析工作流程的严谨性和可重复性。
bioRxiv. 2024 Jul 16:2024.07.10.602975. doi: 10.1101/2024.07.10.602975.
6
Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data.Procrustes 是一种机器学习方法,可消除临床 RNA 测序数据中的跨平台批次效应。
Commun Biol. 2024 Mar 30;7(1):392. doi: 10.1038/s42003-024-06020-z.
7
Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease.GTEx 样本的多组织 H3K27ac 分析将表观基因组变异与疾病联系起来。
Nat Genet. 2023 Oct;55(10):1665-1676. doi: 10.1038/s41588-023-01509-5. Epub 2023 Sep 28.
8
A DNA tumor virus globally reprograms host 3D genome architecture to achieve immortal growth.一种 DNA 肿瘤病毒会全局重编程宿主的 3D 基因组结构,以实现永生化生长。
Nat Commun. 2023 Mar 22;14(1):1598. doi: 10.1038/s41467-023-37347-6.
9
Statistical Analysis in ChIP-seq-Related Applications.ChIP-seq相关应用中的统计分析
Methods Mol Biol. 2023;2629:169-181. doi: 10.1007/978-1-0716-2986-4_9.
10
CRAG: de novo characterization of cell-free DNA fragmentation hotspots in plasma whole-genome sequencing.CRAG:血浆全基因组测序中游离 DNA 片段化热点的从头表征。
Genome Med. 2022 Dec 8;14(1):138. doi: 10.1186/s13073-022-01141-8.
111 个人类参考基因组的综合分析。
Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248.
4
CODEX: a normalization and copy number variation detection method for whole exome sequencing.CODEX:一种用于全外显子组测序的标准化及拷贝数变异检测方法。
Nucleic Acids Res. 2015 Mar 31;43(6):e39. doi: 10.1093/nar/gku1363. Epub 2015 Jan 23.
5
Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.Minfi:一个用于分析 Infinium DNA 甲基化微阵列的灵活且全面的 Bioconductor 软件包。
Bioinformatics. 2014 May 15;30(10):1363-9. doi: 10.1093/bioinformatics/btu049. Epub 2014 Jan 28.
6
Characterizing and measuring bias in sequence data.表征和测量序列数据中的偏差。
Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.
7
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.ENC 和 modENCODE 联盟的 ChIP-seq 指南和实践。
Genome Res. 2012 Sep;22(9):1813-31. doi: 10.1101/gr.136184.111.
8
An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。
Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
9
Summarizing and correcting the GC content bias in high-throughput sequencing.高通量测序中 GC 含量偏倚的总结与校正。
Nucleic Acids Res. 2012 May;40(10):e72. doi: 10.1093/nar/gks001. Epub 2012 Feb 9.
10
Removing technical variability in RNA-seq data using conditional quantile normalization.使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。
Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.