隐马尔可夫模型拓扑结构的优化增强了DNA和蛋白质序列建模。

Optimisation of HMM topologies enhances DNA and protein sequence modelling.

作者信息

Friedrich Torben, Koetschan Christian, Müller Tobias

机构信息

University of Würzburg.

出版信息

Stat Appl Genet Mol Biol. 2010;9:Article 6. doi: 10.2202/1544-6115.1480. Epub 2010 Jan 6.

DOI:10.2202/1544-6115.1480

PMID:20196756

Abstract

Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths. Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitting the distribution of sequence lengths with the method of moments (MM) and maximum likelihood (ML). Performance evaluations of differently adjusted HMM topologies underline the impact of an optimisation for HMMs based on sequence lengths. Secondary structure prediction on internal transcribed spacer 2 sequences demonstrates exemplarily the general impact of topological optimisations. In summary, we propose a general methodology to improve the modelling behaviour of HMMs by topological optimisation with ML and a fast and easily implementable moment estimator.

摘要

隐马尔可夫模型（HMMs）在揭示生物分子功能的应用中发挥着重要作用。尽管HMMs在技术上已经成熟，并广泛应用于计算生物学，但在对具有不同序列长度的生物数据源进行建模方面，仍有方法优化的潜力。这些模型的单个构建块，即状态，与特定的保持时间相关联，这是与所表示序列基序的长度分布的联系。通过隐藏状态的串行链接，在传统隐马尔可夫模型的类别中，实现了常规HMM拓扑结构对钟形序列长度的适配。状态重复因子（r）和特定状态停留持续时间的参数（p）通过用矩量法（MM）和最大似然法（ML）拟合序列长度分布来确定。对不同调整的HMM拓扑结构的性能评估强调了基于序列长度对HMM进行优化的影响。对内部转录间隔区2序列的二级结构预测示例性地展示了拓扑优化的总体影响。总之，我们提出了一种通用方法，通过使用最大似然法和快速且易于实现的矩估计器进行拓扑优化，来改善HMM的建模行为。

相似文献

Optimisation of HMM topologies enhances DNA and protein sequence modelling.隐马尔可夫模型拓扑结构的优化增强了DNA和蛋白质序列建模。

Stat Appl Genet Mol Biol. 2010;9:Article 6. doi: 10.2202/1544-6115.1480. Epub 2010 Jan 6.

HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率，使用轮廓隐马尔可夫模型改进分类。

BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.

Training HMM structure with genetic algorithm for biological sequence analysis.使用遗传算法训练隐马尔可夫模型结构用于生物序列分析。

Bioinformatics. 2004 Dec 12;20(18):3613-9. doi: 10.1093/bioinformatics/bth454. Epub 2004 Aug 5.

Combined prediction of transmembrane topology and signal peptide of beta-barrel proteins: using a hidden Markov model and genetic algorithms.β-桶状蛋白跨膜拓扑结构和信号肽的联合预测：使用隐马尔可夫模型和遗传算法。

Comput Biol Med. 2010 Jul;40(7):621-8. doi: 10.1016/j.compbiomed.2010.04.006. Epub 2010 May 21.

Sequence-based protein structure prediction using a reduced state-space hidden Markov model.使用简化状态空间隐马尔可夫模型进行基于序列的蛋白质结构预测。

Comput Biol Med. 2007 Sep;37(9):1211-24. doi: 10.1016/j.compbiomed.2006.10.014. Epub 2006 Dec 11.

Calibrating E-values for hidden Markov models using reverse-sequence null models.使用反向序列空模型校准隐马尔可夫模型的E值。

Bioinformatics. 2005 Nov 15;21(22):4107-15. doi: 10.1093/bioinformatics/bti629. Epub 2005 Aug 25.

Hidden Markov models in biology.生物学中的隐马尔可夫模型。

Methods Mol Biol. 2010;609:241-53. doi: 10.1007/978-1-60327-241-4_14.

Using hidden Markov models to align multiple sequences.使用隐马尔可夫模型对多个序列进行比对。

Cold Spring Harb Protoc. 2009 Jul;2009(7):pdb.top41. doi: 10.1101/pdb.top41.

Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models.利用系统发育因子隐马尔可夫模型在DNA序列比对中区分速率异质性和种间重组。

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii166-72. doi: 10.1093/bioinformatics/bti1127.

Drifting Markov models with polynomial drift and applications to DNA sequences.具有多项式漂移的漂移马尔可夫模型及其在DNA序列中的应用。

Stat Appl Genet Mol Biol. 2008;7(1):Article6. doi: 10.2202/1544-6115.1326. Epub 2008 Feb 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

隐马尔可夫模型拓扑结构的优化增强了DNA和蛋白质序列建模。

Optimisation of HMM topologies enhances DNA and protein sequence modelling.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献