David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-11-S1-S40.
Existing hidden Markov model decoding algorithms do not focus on approximately identifying the sequence feature boundaries.
We give a set of algorithms to compute the conditional probability of all labellings "near" a reference labelling lambda for a sequence y for a variety of definitions of "near". In addition, we give optimization algorithms to find the best labelling for a sequence in the robust sense of having all of its feature boundaries nearly correct. Natural problems in this domain are NP-hard to optimize. For membrane proteins, our algorithms find the approximate topology of such proteins with comparable success to existing programs, while being substantially more accurate in estimating the positions of transmembrane helix boundaries.
More robust HMM decoding may allow for better analysis of sequence features, in reasonable runtimes.
现有的隐马尔可夫模型解码算法并不专注于近似识别序列特征边界。
我们给出了一组算法,可以计算序列 y 的参考标记 lambda 附近的所有标记的条件概率,对于“附近”的各种定义。此外,我们还给出了优化算法,以在具有几乎所有特征边界都正确的鲁棒意义上为序列找到最佳标记。该领域的自然问题在优化方面是 NP 难的。对于膜蛋白,我们的算法在找到这些蛋白质的近似拓扑结构方面取得了与现有程序相当的成功,同时在估计跨膜螺旋边界的位置方面要准确得多。
更稳健的 HMM 解码可以在合理的运行时间内允许更好地分析序列特征。