Baldi P, Chauvin Y, Hunkapiller T, McClure M A
Division of Biology, California Institute of Technology, Pasadena 91125.
Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059-63. doi: 10.1073/pnas.91.3.1059.
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.
隐马尔可夫模型(HMM)技术用于对生物序列家族进行建模。引入了一种平滑且收敛的算法,以便根据给定家族中的示例迭代地调整模型的转移和发射参数。HMM方法应用于三个蛋白质家族:球蛋白、免疫球蛋白和激酶。在所有情况下,所推导的模型都能捕捉到该家族的重要统计特征,并可用于许多任务,包括多重比对、基序检测和分类。对于平均长度为N的K个序列,该方法产生了一种有效的多重比对算法,其需要O(KN2)次操作,与序列数量呈线性关系。