Suppr超能文献

基于贝叶斯变点方法的 ChIP-seq 数据进行蛋白-DNA 结合和组蛋白修饰的全基因组定位。

Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data.

机构信息

Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America.

出版信息

PLoS Comput Biol. 2012;8(7):e1002613. doi: 10.1371/journal.pcbi.1002613. Epub 2012 Jul 26.

Abstract

Next-generation sequencing (NGS) technologies have matured considerably since their introduction and a focus has been placed on developing sophisticated analytical tools to deal with the amassing volumes of data. Chromatin immunoprecipitation sequencing (ChIP-seq), a major application of NGS, is a widely adopted technique for examining protein-DNA interactions and is commonly used to investigate epigenetic signatures of diffuse histone marks. These datasets have notoriously high variance and subtle levels of enrichment across large expanses, making them exceedingly difficult to define. Windows-based, heuristic models and finite-state hidden Markov models (HMMs) have been used with some success in analyzing ChIP-seq data but with lingering limitations. To improve the ability to detect broad regions of enrichment, we developed a stochastic Bayesian Change-Point (BCP) method, which addresses some of these unresolved issues. BCP makes use of recent advances in infinite-state HMMs by obtaining explicit formulas for posterior means of read densities. These posterior means can be used to categorize the genome into enriched and unenriched segments, as is customarily done, or examined for more detailed relationships since the underlying subpeaks are preserved rather than simplified into a binary classification. BCP performs a near exhaustive search of all possible change points between different posterior means at high-resolution to minimize the subjectivity of window sizes and is computationally efficient, due to a speed-up algorithm and the explicit formulas it employs. In the absence of a well-established "gold standard" for diffuse histone mark enrichment, we corroborated BCP's island detection accuracy and reproducibility using various forms of empirical evidence. We show that BCP is especially suited for analysis of diffuse histone ChIP-seq data but also effective in analyzing punctate transcription factor ChIP datasets, making it widely applicable for numerous experiment types.

摘要

下一代测序 (NGS) 技术自问世以来已经相当成熟,并且已经将重点放在开发复杂的分析工具上,以处理不断增加的数据量。NGS 的主要应用之一是染色质免疫沉淀测序 (ChIP-seq),它是一种广泛采用的检测蛋白质-DNA 相互作用的技术,常用于研究弥漫性组蛋白标记的表观遗传特征。这些数据集具有很高的方差和微妙的富集水平,在很大的范围内都存在,因此非常难以定义。基于窗口的启发式模型和有限状态隐马尔可夫模型 (HMM) 在分析 ChIP-seq 数据方面取得了一定的成功,但仍存在一些遗留问题。为了提高检测广泛富集区域的能力,我们开发了一种随机贝叶斯变化点 (BCP) 方法,该方法解决了其中一些未解决的问题。BCP 利用无限状态 HMM 的最新进展,通过获得读取密度后验均值的显式公式来实现。这些后验均值可用于将基因组划分为富集和未富集的区域,就像通常所做的那样,或者可以更详细地研究它们之间的关系,因为底层亚峰得以保留,而不是简化为二进制分类。BCP 在高分辨率下对不同后验均值之间的所有可能变化点进行近乎穷尽的搜索,以最小化窗口大小的主观性,并且由于采用了加速算法和显式公式,因此计算效率很高。在缺乏弥漫性组蛋白标记富集的既定“黄金标准”的情况下,我们使用各种形式的经验证据来验证 BCP 的岛检测准确性和可重复性。我们表明,BCP 特别适合分析弥漫性组蛋白 ChIP-seq 数据,但也可以有效地分析点状转录因子 ChIP 数据集,因此它广泛适用于许多实验类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dacb/3406014/e2836cb72462/pcbi.1002613.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验