Suppr超能文献

稀疏相关隐马尔可夫模型及其在全基因组定位研究中的应用。

Sparsely correlated hidden Markov models with application to genome-wide location studies.

机构信息

National University of Singapore and National University Health System, Singapore 117597, Singapore.

出版信息

Bioinformatics. 2013 Mar 1;29(5):533-41. doi: 10.1093/bioinformatics/btt012. Epub 2013 Jan 16.

Abstract

MOTIVATION

Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable.

RESULTS

We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward-backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis.

AVAILABILITY

The scHMM package can be freely downloaded from http://sourceforge.net/p/schmm/ and is recommended for use in a linux environment.

摘要

动机

在调节蛋白和表观遗传修饰的全基因组位置分析中,多重相关数据集变得越来越常见。它们的相关性可以直接纳入统计模型中,以捕获潜在的生物学相互作用,但这种建模很快变得计算上难以处理。

结果

我们提出了稀疏相关隐马尔可夫模型(scHMM),这是一种用于对多个基因组数据集同时进行隐马尔可夫模型(HMM)推断的新方法。在 scHMM 中,假设每个系列都有一个单独的 HMM,但每个系列中的转移概率不仅取决于其自身的隐藏状态,还取决于其他相关系列的隐藏状态。对于每个系列,scHMM 使用惩罚回归选择其他数据系列的子集,并估计它们对给定系列中每个转移的几率的影响。之后,使用标准的前向-后向算法推断隐藏状态,在每个位置调整模型的转移概率,这有助于保持计算顺序接近拟合独立 HMM(iHMM)。因此,scHMM 是一组相互依赖的非齐次 HMM,可以非常接近地逼近完全多元 HMM 拟合。一项模拟研究表明,scHMM 在计算成本低得多的情况下,达到了与多元 HMM 拟合相当的灵敏度。该方法在人类 CD4+T 细胞中 39 种组蛋白修饰、CTCF 和 RNA 聚合酶 II 的联合分析中得到了验证。在这个数据集上,scHMM 报告的高可信度区域比 iHMM 少,但 scHMM 可以比 iHMM 更好地恢复相关基因组区域中先前表征的组蛋白修饰。此外,scHMM 产生的组合模式可以更好地映射到 Ernst 和 Kellis 的多元 HMM 方法报告的 51 个状态。

可用性

scHMM 包可以从 http://sourceforge.net/p/schmm/ 免费下载,建议在 Linux 环境中使用。

相似文献

5
MRHMMs: multivariate regression hidden Markov models and the variantS.MRHMMs:多元回归隐马尔可夫模型及其变体
Bioinformatics. 2014 Jun 15;30(12):1755-6. doi: 10.1093/bioinformatics/btu070. Epub 2014 Feb 19.

本文引用的文献

10
Genome-wide mapping of in vivo protein-DNA interactions.体内蛋白质-DNA相互作用的全基因组图谱绘制。
Science. 2007 Jun 8;316(5830):1497-502. doi: 10.1126/science.1141319. Epub 2007 May 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验