• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用视觉标签和监督式机器学习优化染色质免疫沉淀测序(ChIP-seq)峰检测工具

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

作者信息

Hocking Toby Dylan, Goerner-Potvin Patricia, Morin Andreanne, Shao Xiaojian, Pastinen Tomi, Bourque Guillaume

机构信息

Department of Human Genetics, McGill University, H3A-1A4, Montréal, Canada.

出版信息

Bioinformatics. 2017 Feb 15;33(4):491-499. doi: 10.1093/bioinformatics/btw672.

DOI:10.1093/bioinformatics/btw672
PMID:27797775
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5408812/
Abstract

MOTIVATION

Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome.

RESULTS

We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms.

AVAILABILITY AND IMPLEMENTATION

Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ , R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError.

CONTACTS

toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

已经提出了许多用于ChIP-seq数据分析的峰值检测算法,但对于任何给定数据集,哪种算法和哪些参数是最优的并不明显。相比之下,通过在基因组浏览器中目视检查比对后的读数计数,可以轻松标记有明显峰值和无明显峰值的区域。我们提出了一种用于ChIP-seq数据分析的监督机器学习方法,使用对哪些基因组区域包含或不包含峰值进行定性判断编码的标签。主要思想是手动标记基因组的一小部分,然后学习一个能对基因组其余部分做出一致峰值预测的模型。

结果

我们创建了7个新的组蛋白标记数据集,带有12826个通过目视确定的标签,并分析了3个现有的转录因子数据集。我们观察到默认的峰值检测参数会产生较高的假阳性率,通过使用来自相同实验类型的相对较小的标记数据训练集来学习参数,可以降低该比率。我们还观察到不同人员给出的标签高度一致。总体而言,这些数据表明我们的监督标记方法对于定量训练和测试峰值检测算法很有用。

可用性和实现方式

标记的组蛋白标记数据http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/,用于计算预测峰值标签错误的R包https://github.com/tdhock/PeakError。

联系方式

toby.hocking@mail.mcgill.ca或guil.bourque@mcgill.ca。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/cc1273549f84/btw672f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/7d4740456bf8/btw672f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/42635a289a93/btw672f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/30fdd604dddc/btw672f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/c0c3c18d49a8/btw672f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/14d6be22962d/btw672f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/cc1273549f84/btw672f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/7d4740456bf8/btw672f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/42635a289a93/btw672f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/30fdd604dddc/btw672f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/c0c3c18d49a8/btw672f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/14d6be22962d/btw672f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0932/5408812/cc1273549f84/btw672f6.jpg

相似文献

1
Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.使用视觉标签和监督式机器学习优化染色质免疫沉淀测序(ChIP-seq)峰检测工具
Bioinformatics. 2017 Feb 15;33(4):491-499. doi: 10.1093/bioinformatics/btw672.
2
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.一种用于多个ChIP-seq数据集定量比较的新型统计方法。
Bioinformatics. 2015 Jun 15;31(12):1889-96. doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.
3
Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile.使用链位移谱图对 ChIP-seq 读取分布进行灵敏且稳健的评估。
Bioinformatics. 2018 Jul 15;34(14):2356-2363. doi: 10.1093/bioinformatics/bty137.
4
RECAP reveals the true statistical significance of ChIP-seq peak calls.RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。
Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.
5
Graph Peak Caller: Calling ChIP-seq peaks on graph-based reference genomes.图峰调用器:基于图的参考基因组上的 ChIP-seq 峰调用。
PLoS Comput Biol. 2019 Feb 19;15(2):e1006731. doi: 10.1371/journal.pcbi.1006731. eCollection 2019 Feb.
6
DiffChIPL: a differential peak analysis method for high-throughput sequencing data with biological replicates based on limma.DiffChIPL:一种基于 limma 的具有生物学重复的高通量测序数据差异峰分析方法。
Bioinformatics. 2022 Sep 2;38(17):4062-4069. doi: 10.1093/bioinformatics/btac498.
7
Use model-based Analysis of ChIP-Seq (MACS) to analyze short reads generated by sequencing protein-DNA interactions in embryonic stem cells.使用基于模型的ChIP-Seq分析方法(MACS)来分析通过对胚胎干细胞中蛋白质-DNA相互作用进行测序而产生的短序列 reads。
Methods Mol Biol. 2014;1150:81-95. doi: 10.1007/978-1-4939-0512-6_4.
8
PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data.PePr:一种峰值检测优先级排序流程,用于从重复的ChIP-Seq数据中识别一致或差异峰值。
Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.
9
ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.ChIPWig:一种用于 ChIP-seq 数据的随机访问支持的无损和有损压缩方法。
Bioinformatics. 2018 Mar 15;34(6):911-919. doi: 10.1093/bioinformatics/btx685.
10
Genome annotation test with validation on transcription start site and ChIP-Seq for Pol-II binding data.基因组注释测试,针对转录起始位点进行验证,并进行 Pol-II 结合数据的 ChIP-Seq 分析。
Bioinformatics. 2011 Jun 15;27(12):1610-7. doi: 10.1093/bioinformatics/btr263. Epub 2011 May 9.

引用本文的文献

1
Benchmarking peak calling methods for CUT&RUN.用于CUT&RUN的峰值检测方法基准测试
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf375.
2
Unsupervised contrastive peak caller for ATAC-seq.无监督对比峰 caller 用于 ATAC-seq。
Genome Res. 2023 Jul;33(7):1133-1144. doi: 10.1101/gr.277677.123. Epub 2023 May 22.
3
Unsupervised Contrastive Peak Caller for ATAC-seq.用于ATAC序列的无监督对比峰检测工具

本文引用的文献

1
JAMM: a peak finder for joint analysis of NGS replicates.JAMM:一种用于对NGS重复样本进行联合分析的峰查找工具。
Bioinformatics. 2015 Jan 1;31(1):48-55. doi: 10.1093/bioinformatics/btu568. Epub 2014 Sep 15.
2
Epiviz: interactive visual analytics for functional genomics data.Epiviz:功能基因组学数据的交互式可视化分析。
Nat Methods. 2014 Sep;11(9):938-40. doi: 10.1038/nmeth.3038. Epub 2014 Aug 3.
3
PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data.PePr:一种峰值检测优先级排序流程,用于从重复的ChIP-Seq数据中识别一致或差异峰值。
bioRxiv. 2023 Jan 8:2023.01.07.523108. doi: 10.1101/2023.01.07.523108.
4
Epidemic changepoint detection in the presence of nuisance changes.存在干扰变化时的流行率变化点检测。
Stat Pap (Berl). 2023;64(1):17-39. doi: 10.1007/s00362-022-01307-x. Epub 2022 Apr 4.
5
Enhanced epigenetic profiling of classical human monocytes reveals a specific signature of healthy aging in the DNA methylome.经典人类单核细胞的增强表观遗传学分析揭示了 DNA 甲基组中健康衰老的特定特征。
Nat Aging. 2021 Jan;1(1):124-141. doi: 10.1038/s43587-020-00002-6. Epub 2020 Nov 23.
6
Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.利用监督分割模型提高过度分散的 ChIP-seq 数据的峰值检测准确性。
BMC Bioinformatics. 2021 Jun 14;22(1):323. doi: 10.1186/s12859-021-04221-5.
7
Multi Locus View: an extensible web-based tool for the analysis of genomic data.多基因座视图:一个可扩展的基于网络的基因组数据分析工具。
Commun Biol. 2021 May 25;4(1):623. doi: 10.1038/s42003-021-02097-y.
8
Semi-supervised peak calling with SPAN and JBR genome browser.基于 SPAN 和 JBR 基因组浏览器的半监督峰调用
Bioinformatics. 2021 Nov 18;37(22):4235-4237. doi: 10.1093/bioinformatics/btab376.
9
ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach.ChIP-BIT2:一种使用贝叶斯整合方法检测弱结合事件的软件工具。
BMC Bioinformatics. 2021 Apr 15;22(1):193. doi: 10.1186/s12859-021-04108-5.
10
CNN-Peaks: ChIP-Seq peak detection pipeline using convolutional neural networks that imitate human visual inspection.CNN-Peaks:使用卷积神经网络进行 ChIP-Seq 峰检测的管道,该网络模仿人类视觉检查。
Sci Rep. 2020 May 13;10(1):7933. doi: 10.1038/s41598-020-64655-4.
Bioinformatics. 2014 Sep 15;30(18):2568-75. doi: 10.1093/bioinformatics/btu372. Epub 2014 Jun 3.
4
SegAnnDB: interactive Web-based genomic segmentation.SegAnnDB:交互式基于网络的基因组分割。
Bioinformatics. 2014 Jun 1;30(11):1539-46. doi: 10.1093/bioinformatics/btu072. Epub 2014 Feb 3.
5
Practical guidelines for the comprehensive analysis of ChIP-seq data.《ChIP-seq 数据综合分析实用指南》
PLoS Comput Biol. 2013;9(11):e1003326. doi: 10.1371/journal.pcbi.1003326. Epub 2013 Nov 14.
6
HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data.HMCan:一种使用 ChIP-seq 数据检测癌症样本中染色质修饰的方法。
Bioinformatics. 2013 Dec 1;29(23):2979-86. doi: 10.1093/bioinformatics/btt524. Epub 2013 Sep 9.
7
Web Apollo: a web-based genomic annotation editing platform.网络阿波罗:一个基于网络的基因组注释编辑平台。
Genome Biol. 2013 Aug 30;14(8):R93. doi: 10.1186/gb-2013-14-8-r93.
8
Learning smoothing models of copy number profiles using breakpoint annotations.使用断点注释学习拷贝数谱的平滑模型。
BMC Bioinformatics. 2013 May 22;14:164. doi: 10.1186/1471-2105-14-164.
9
Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes.ChIP-seq 与机器学习的整合揭示了黑素细胞中的增强子和一个具有预测性的调控序列词汇。
Genome Res. 2012 Nov;22(11):2290-301. doi: 10.1101/gr.139360.112. Epub 2012 Sep 27.
10
Spark: a navigational paradigm for genomic data exploration.Spark:一种用于基因组数据探索的导航范例。
Genome Res. 2012 Nov;22(11):2262-9. doi: 10.1101/gr.140665.112. Epub 2012 Sep 7.