一种用于状态空间推理和聚类的MAD-贝叶斯算法及其在查询大量ChIP-Seq数据集方面的应用

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

作者信息

Zuo Chandler, Chen Kailei, Keleş Sündüz

机构信息

Department of Statistics, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, Wisconsin.

出版信息

J Comput Biol. 2017 Jun;24(6):472-485. doi: 10.1089/cmb.2016.0138. Epub 2016 Nov 11.

DOI:10.1089/cmb.2016.0138

PMID:27835030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5467113/

Abstract

Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared toward multisample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq data sets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, Zuo et al. developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq data sets. Although this versatile framework estimates both the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization-based estimation structure hinders its applicability with large number of loci and samples. We address this limitation by developing MAP-based asymptotic derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm that converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparison with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq data sets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.

摘要

当前用于查询来自多种细胞类型的大量染色质免疫沉淀测序（ChIP-seq）数据的分析方法依赖于对每个数据集进行独立分析（即峰检测）。这种方法忽略了功能元件在相关细胞类型中经常共享这一事实，并导致对不同ChIP-seq样本之间差异程度的高估。针对多样本研究的方法在旨在整合数百到数千个查询位点的ChIP-seq数据集（例如，具有特定结合位点的数千个基因组位点）的情况下适用性有限。最近，左等人开发了一种用于状态空间矩阵推断和聚类的分层框架，名为MBASIC，以实现对多个ChIP-seq数据集的用户指定位点进行联合分析。尽管这个通用框架既估计潜在的状态空间（例如，结合与未结合），又将具有相似模式的位点分组在一起，但其基于期望最大化的估计结构阻碍了它在大量位点和样本中的适用性。我们通过为MBASIC开发基于贝叶斯的最大后验概率渐近推导（MAD-Bayes）框架来解决这一限制。这产生了一种类似K均值的优化算法，该算法收敛迅速，因此能够探索多种初始化方案并在调整方面具有灵活性。与MBASIC的比较表明，这种速度是以估计精度的相对较小损失为代价的。尽管MAD-Bayes MBASIC是专门为分析用户指定位点而设计的，但它能够从多个ChIP-seq数据集中捕获组蛋白标记的总体模式，类似于通过全基因组分割方法（如ChromHMM和Spectacle）所识别的模式。

相似文献

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

J Comput Biol. 2017 Jun;24(6):472-485. doi: 10.1089/cmb.2016.0138. Epub 2016 Nov 11.

Software for rapid time dependent ChIP-sequencing analysis (TDCA).

BMC Bioinformatics. 2017 Nov 25;18(1):521. doi: 10.1186/s12859-017-1936-x.

Comparing genome-wide chromatin profiles using ChIP-chip or ChIP-seq.

Bioinformatics. 2010 Apr 15;26(8):1000-6. doi: 10.1093/bioinformatics/btq087. Epub 2010 Mar 5.

Unified Analysis of Multiple ChIP-Seq Datasets.

Methods Mol Biol. 2021;2198:451-465. doi: 10.1007/978-1-0716-0876-0_33.

Chromatin Immunoprecipitation and High-Throughput Sequencing (ChIP-Seq): Tips and Tricks Regarding the Laboratory Protocol and Initial Downstream Data Analysis.

Methods Mol Biol. 2018;1767:271-288. doi: 10.1007/978-1-4939-7774-1_15.

ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles.

Nucleic Acids Res. 2016 Apr 20;44(7):e65. doi: 10.1093/nar/gkv1491. Epub 2015 Dec 23.

A Hierarchical Framework for State-Space Matrix Inference and Clustering.

Ann Appl Stat. 2016 Sep;10(3):1348-1372. doi: 10.1214/16-AOAS938. Epub 2016 Sep 28.

Seqinspector: position-based navigation through the ChIP-seq data landscape to identify gene expression regulators.

BMC Bioinformatics. 2016 Feb 12;17:85. doi: 10.1186/s12859-016-0938-4.

Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data.

PLoS Comput Biol. 2012;8(7):e1002613. doi: 10.1371/journal.pcbi.1002613. Epub 2012 Jul 26.

HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data.

BMC Bioinformatics. 2014 Aug 15;15(1):280. doi: 10.1186/1471-2105-15-280.

本文引用的文献

A Hierarchical Framework for State-Space Matrix Inference and Clustering.

Ann Appl Stat. 2016 Sep;10(3):1348-1372. doi: 10.1214/16-AOAS938. Epub 2016 Sep 28.

A Statistical Framework for the Analysis of ChIP-Seq Data.

J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.

Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome.

Mol Cell. 2015 Jul 2;59(1):62-74. doi: 10.1016/j.molcel.2015.05.020. Epub 2015 Jun 11.

Spectacle: fast chromatin state annotation using spectral learning.

Genome Biol. 2015 Feb 12;16(1):33. doi: 10.1186/s13059-015-0598-0.

hiHMM: Bayesian non-parametric joint inference of chromatin state maps.

Bioinformatics. 2015 Jul 1;31(13):2066-74. doi: 10.1093/bioinformatics/btv117. Epub 2015 Feb 27.

Integrative analysis of 111 reference human epigenomes.

Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248.

dCaP: detecting differential binding events in multiple conditions and proteins.

BMC Genomics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2164-15-S9-S12. Epub 2014 Dec 8.

Joint analysis of differential gene expression in multiple studies using correlation motifs.

Biostatistics. 2015 Jan;16(1):31-46. doi: 10.1093/biostatistics/kxu038. Epub 2014 Aug 19.

An integrated model of multiple-condition ChIP-Seq data reveals predeterminants of Cdx2 binding.

PLoS Comput Biol. 2014 Mar 27;10(3):e1003501. doi: 10.1371/journal.pcbi.1003501. eCollection 2014 Mar.

Joint modeling of ChIP-seq data via a Markov random field model.

Biostatistics. 2014 Apr;15(2):296-310. doi: 10.1093/biostatistics/kxt047. Epub 2013 Oct 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于状态空间推理和聚类的MAD-贝叶斯算法及其在查询大量ChIP-Seq数据集方面的应用

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

作者信息

Zuo Chandler, Chen Kailei, Keleş Sündüz

机构信息

Department of Statistics, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison , Madison, Wisconsin.

出版信息

J Comput Biol. 2017 Jun;24(6):472-485. doi: 10.1089/cmb.2016.0138. Epub 2016 Nov 11.

DOI:10.1089/cmb.2016.0138

PMID:27835030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5467113/

Abstract

摘要

一种用于状态空间推理和聚类的MAD-贝叶斯算法及其在查询大量ChIP-Seq数据集方面的应用

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种用于状态空间推理和聚类的MAD-贝叶斯算法及其在查询大量ChIP-Seq数据集方面的应用

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

作者信息

机构信息

出版信息