使用隐马尔可夫模型对多个序列进行比对。

Using hidden Markov models to align multiple sequences.

作者信息

Mount David W

出版信息

Cold Spring Harb Protoc. 2009 Jul;2009(7):pdb.top41. doi: 10.1101/pdb.top41.

PMID:20147223

Abstract

A hidden Markov model (HMM) is a probabilistic model of a multiple sequence alignment (msa) of proteins. In the model, each column of symbols in the alignment is represented by a frequency distribution of the symbols (called a "state"), and insertions and deletions are represented by other states. One moves through the model along a particular path from state to state in a Markov chain (i.e., random choice of next move), trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to that state from a previous one (the transition probability). State and transition probabilities are multiplied to obtain a probability of the given sequence. The hidden nature of the HMM is due to the lack of information about the value of a specific state, which is instead represented by a probability distribution over all possible values. This article discusses the advantages and disadvantages of HMMs in msa and presents algorithms for calculating an HMM and the conditions for producing the best HMM.

摘要

隐马尔可夫模型（HMM）是一种蛋白质多序列比对（MSA）的概率模型。在该模型中，比对中每一列符号由符号的频率分布（称为“状态”）表示，插入和缺失由其他状态表示。人们沿着马尔可夫链中从一个状态到另一个状态的特定路径遍历该模型（即，随机选择下一步移动），试图匹配给定序列。从每个状态中选择下一个匹配符号，记录其概率（频率）以及从前一个状态转移到该状态的概率（转移概率）。将状态概率和转移概率相乘，以获得给定序列的概率。HMM的隐藏性质是由于缺乏关于特定状态值的信息，该信息由所有可能值上的概率分布表示。本文讨论了HMM在MSA中的优缺点，并给出了计算HMM的算法以及产生最佳HMM的条件。