Cheriton School of Computer Science, University of Waterloo, 200 University Avenue W, Waterloo, Ontario, Canada N2L 3G1.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S28. doi: 10.1186/1471-2105-11-S1-S28.
Traditional algorithms for hidden Markov model decoding seek to maximize either the probability of a state path or the number of positions of a sequence assigned to the correct state. These algorithms provide only a single answer and in practice do not produce good results.
We explore an alternative approach, where we efficiently compute the k paths of highest probability to explain a sequence and then either use those paths to explore alternative explanations for a sequence or to combine them into a single explanation. Our procedure uses an online pruning technique to reduce usage of primary memory.
Out algorithm uses much less memory than naive approach. For membrane proteins, even simple path combination algorithms give good explanations, and if we look at the paths we are combining, we can give a sense of confidence in the explanation as well. For proteins with two topologies, the k best paths can give insight into both correct explanations of a sequence, a feature lacking from traditional algorithms in this domain.
传统的隐马尔可夫模型解码算法旨在最大化状态路径的概率或序列中分配给正确状态的位置数量。这些算法只提供一个单一的答案,在实践中并不能产生很好的结果。
我们探索了一种替代方法,其中我们高效地计算了解释序列的 k 条最高概率路径,然后可以使用这些路径来探索序列的替代解释,或者将它们组合成一个单一的解释。我们的程序使用在线修剪技术来减少主内存的使用。
我们的算法比盲目算法使用的内存少得多。对于膜蛋白,即使是简单的路径组合算法也能给出很好的解释,如果我们观察要组合的路径,我们也可以对解释的可信度有一定的了解。对于具有两种拓扑结构的蛋白质,k 条最佳路径可以深入了解序列的两种正确解释,这是该领域传统算法所缺乏的特征。