基于贝叶斯变点方法的 ChIP-seq 数据进行蛋白-DNA 结合和组蛋白修饰的全基因组定位。

Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data.

机构信息

Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America.

出版信息

PLoS Comput Biol. 2012;8(7):e1002613. doi: 10.1371/journal.pcbi.1002613. Epub 2012 Jul 26.

DOI:10.1371/journal.pcbi.1002613

PMID:22844240

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3406014/

Abstract

Next-generation sequencing (NGS) technologies have matured considerably since their introduction and a focus has been placed on developing sophisticated analytical tools to deal with the amassing volumes of data. Chromatin immunoprecipitation sequencing (ChIP-seq), a major application of NGS, is a widely adopted technique for examining protein-DNA interactions and is commonly used to investigate epigenetic signatures of diffuse histone marks. These datasets have notoriously high variance and subtle levels of enrichment across large expanses, making them exceedingly difficult to define. Windows-based, heuristic models and finite-state hidden Markov models (HMMs) have been used with some success in analyzing ChIP-seq data but with lingering limitations. To improve the ability to detect broad regions of enrichment, we developed a stochastic Bayesian Change-Point (BCP) method, which addresses some of these unresolved issues. BCP makes use of recent advances in infinite-state HMMs by obtaining explicit formulas for posterior means of read densities. These posterior means can be used to categorize the genome into enriched and unenriched segments, as is customarily done, or examined for more detailed relationships since the underlying subpeaks are preserved rather than simplified into a binary classification. BCP performs a near exhaustive search of all possible change points between different posterior means at high-resolution to minimize the subjectivity of window sizes and is computationally efficient, due to a speed-up algorithm and the explicit formulas it employs. In the absence of a well-established "gold standard" for diffuse histone mark enrichment, we corroborated BCP's island detection accuracy and reproducibility using various forms of empirical evidence. We show that BCP is especially suited for analysis of diffuse histone ChIP-seq data but also effective in analyzing punctate transcription factor ChIP datasets, making it widely applicable for numerous experiment types.

摘要

下一代测序 (NGS) 技术自问世以来已经相当成熟，并且已经将重点放在开发复杂的分析工具上，以处理不断增加的数据量。NGS 的主要应用之一是染色质免疫沉淀测序 (ChIP-seq)，它是一种广泛采用的检测蛋白质-DNA 相互作用的技术，常用于研究弥漫性组蛋白标记的表观遗传特征。这些数据集具有很高的方差和微妙的富集水平，在很大的范围内都存在，因此非常难以定义。基于窗口的启发式模型和有限状态隐马尔可夫模型 (HMM) 在分析 ChIP-seq 数据方面取得了一定的成功，但仍存在一些遗留问题。为了提高检测广泛富集区域的能力，我们开发了一种随机贝叶斯变化点 (BCP) 方法，该方法解决了其中一些未解决的问题。BCP 利用无限状态 HMM 的最新进展，通过获得读取密度后验均值的显式公式来实现。这些后验均值可用于将基因组划分为富集和未富集的区域，就像通常所做的那样，或者可以更详细地研究它们之间的关系，因为底层亚峰得以保留，而不是简化为二进制分类。BCP 在高分辨率下对不同后验均值之间的所有可能变化点进行近乎穷尽的搜索，以最小化窗口大小的主观性，并且由于采用了加速算法和显式公式，因此计算效率很高。在缺乏弥漫性组蛋白标记富集的既定“黄金标准”的情况下，我们使用各种形式的经验证据来验证 BCP 的岛检测准确性和可重复性。我们表明，BCP 特别适合分析弥漫性组蛋白 ChIP-seq 数据，但也可以有效地分析点状转录因子 ChIP 数据集，因此它广泛适用于许多实验类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dacb/3406014/e2836cb72462/pcbi.1002613.g001.jpg

相似文献

Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data.基于贝叶斯变点方法的 ChIP-seq 数据进行蛋白-DNA 结合和组蛋白修饰的全基因组定位。

PLoS Comput Biol. 2012;8(7):e1002613. doi: 10.1371/journal.pcbi.1002613. Epub 2012 Jul 26.

A novel Bayesian change-point algorithm for genome-wide analysis of diverse ChIPseq data types.一种用于多种ChIPseq数据类型全基因组分析的新型贝叶斯变点算法。

J Vis Exp. 2012 Dec 10(70):e4273. doi: 10.3791/4273.

Integrative analyses for omics data: a Bayesian mixture model to assess the concordance of ChIP-chip and ChIP-seq measurements.组学数据的综合分析：一种贝叶斯混合模型，用于评估 ChIP-chip 和 ChIP-seq 测量的一致性。

J Toxicol Environ Health A. 2012;75(8-10):461-70. doi: 10.1080/15287394.2012.674914.

Genome-Wide Identification of Transcription Factor-Binding Sites in Quiescent Adult Neural Stem Cells.成年静止神经干细胞中转录因子结合位点的全基因组鉴定

Methods Mol Biol. 2018;1686:265-286. doi: 10.1007/978-1-4939-7371-2_19.

Software for rapid time dependent ChIP-sequencing analysis (TDCA).用于快速时间依赖性染色质免疫沉淀测序分析（TDCA）的软件。

BMC Bioinformatics. 2017 Nov 25;18(1):521. doi: 10.1186/s12859-017-1936-x.

Epigenetic analysis: ChIP-chip and ChIP-seq.表观遗传学分析：染色质免疫沉淀芯片技术和染色质免疫沉淀测序技术。

Methods Mol Biol. 2012;802:377-87. doi: 10.1007/978-1-61779-400-1_25.

Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond.染色质免疫沉淀测序（ChIP-seq）在转录因子结合位点发现、差异基因调控机制、表观遗传标记及其他方面的作用。

Cell Cycle. 2014;13(18):2847-52. doi: 10.4161/15384101.2014.949201.

A fully Bayesian hidden Ising model for ChIP-seq data analysis.用于 ChIP-seq 数据分析的全贝叶斯隐马尔可夫模型。

Biostatistics. 2012 Jan;13(1):113-28. doi: 10.1093/biostatistics/kxr029. Epub 2011 Sep 13.

HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data.HMCan：一种使用 ChIP-seq 数据检测癌症样本中染色质修饰的方法。

Bioinformatics. 2013 Dec 1;29(23):2979-86. doi: 10.1093/bioinformatics/btt524. Epub 2013 Sep 9.

Modelling ChIP-seq Data Using HMMs.使用隐马尔可夫模型对染色质免疫沉淀测序（ChIP-seq）数据进行建模

Methods Mol Biol. 2017;1552:115-122. doi: 10.1007/978-1-4939-6753-7_8.

引用本文的文献

Distinct BMP-Smad signaling outputs confer diverse functions in dental mesenchyme.不同的骨形态发生蛋白-信号转导和转录激活因子信号输出赋予牙间充质多种功能。

Development. 2025 Jun 15;152(12). doi: 10.1242/dev.204563. Epub 2025 Jun 19.

A unified hypothesis-free feature extraction framework for diverse epigenomic data.一种用于多种表观基因组数据的统一的无假设特征提取框架。

Bioinform Adv. 2025 Mar 8;5(1):vbaf013. doi: 10.1093/bioadv/vbaf013. eCollection 2025.

Multiplex-GAM: genome-wide identification of chromatin contacts yields insights overlooked by Hi-C.多重关联分析方法：全基因组鉴定染色质接触，揭示 Hi-C 方法忽视的见解。

Nat Methods. 2023 Jul;20(7):1037-1047. doi: 10.1038/s41592-023-01903-1. Epub 2023 Jun 19.

Dynamic antagonism between key repressive pathways maintains the placental epigenome.关键抑制途径的动态拮抗作用维持胎盘表观基因组。

Nat Cell Biol. 2023 Apr;25(4):579-591. doi: 10.1038/s41556-023-01114-y. Epub 2023 Apr 6.

Transcription factor protein interactomes reveal genetic determinants in heart disease.转录因子蛋白质互作组揭示了心脏病的遗传决定因素。

Cell. 2022 Mar 3;185(5):794-814.e30. doi: 10.1016/j.cell.2022.01.021. Epub 2022 Feb 18.

Brahma safeguards canalization of cardiac mesoderm differentiation.Brahma 确保心脏中胚层分化的 canalization。

Nature. 2022 Feb;602(7895):129-134. doi: 10.1038/s41586-021-04336-y. Epub 2022 Jan 26.

A flexible ChIP-sequencing simulation toolkit.一个灵活的 ChIP-seq 模拟工具包。

BMC Bioinformatics. 2021 Apr 20;22(1):201. doi: 10.1186/s12859-021-04097-5.

Deciphering hierarchical organization of topologically associated domains through change-point testing.通过变更点检测来破译拓扑关联域的层次结构组织。

BMC Bioinformatics. 2021 Apr 10;22(1):183. doi: 10.1186/s12859-021-04113-8.

Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation.测序解答：当前用于揭示转录调控机制的数据整合方法。

Comput Struct Biotechnol J. 2020 May 31;18:1330-1341. doi: 10.1016/j.csbj.2020.05.018. eCollection 2020.

Improved detection of epigenomic marks with mixed-effects hidden Markov models.使用混合效应隐藏马尔可夫模型改进表观基因组标记的检测。

Biometrics. 2019 Dec;75(4):1401-1413. doi: 10.1111/biom.13083. Epub 2019 Oct 17.

本文引用的文献

Removing technical variability in RNA-seq data using conditional quantile normalization.使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。

Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.

Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer.乳腺癌中全球 DNA 低甲基化与抑制性染色质结构域形成和基因沉默相关。

Genome Res. 2012 Feb;22(2):246-58. doi: 10.1101/gr.125872.111. Epub 2011 Dec 7.

Identification and correction of systematic error in high-throughput sequence data.高通量测序数据中系统误差的识别与校正。

BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.

Differential patterns of intronic and exonic DNA regions with respect to RNA polymerase II occupancy, nucleosome density and H3K36me3 marking in fission yeast.裂殖酵母中 RNA 聚合酶 II 占据、核小体密度和 H3K36me3 标记的内含子和外显子 DNA 区域的差异模式。

Genome Biol. 2011 Aug 22;12(8):R82. doi: 10.1186/gb-2011-12-8-r82.

An integrated strategy for identification of both sharp and broad peaks from next-generation sequencing data.一种从下一代测序数据中同时鉴定尖锐峰和宽峰的综合策略。

Genome Biol. 2011 Jul 25;12(7):120. doi: 10.1186/gb-2011-12-7-120.

Sequencing technology does not eliminate biological variability.测序技术并不能消除生物变异性。

Nat Biotechnol. 2011 Jul 11;29(7):572-3. doi: 10.1038/nbt.1910.

A user's guide to the encyclopedia of DNA elements (ENCODE).DNA 元件百科全书（ENCODE）使用指南

PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19.

Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。

Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.

Identifying dispersed epigenomic domains from ChIP-Seq data.从 ChIP-Seq 数据中识别离散的表观基因组域。

Bioinformatics. 2011 Mar 15;27(6):870-1. doi: 10.1093/bioinformatics/btr030. Epub 2011 Feb 16.

High-resolution genome-wide mapping of the primary structure of chromatin.高分辨率全基因组范围内的染色质一级结构图谱绘制。

Cell. 2011 Jan 21;144(2):175-86. doi: 10.1016/j.cell.2011.01.003.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于贝叶斯变点方法的 ChIP-seq 数据进行蛋白-DNA 结合和组蛋白修饰的全基因组定位。

Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献