Krogh A, Brown M, Mian I S, Sjölander K, Haussler D
Sinsheimer Laboratories, University of California, Santa Cruz 95064.
J Mol Biol. 1994 Feb 4;235(5):1501-31. doi: 10.1006/jmbi.1994.1104.
Hidden Markov Models (HMMs) are applied to the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated on the globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the SWISS-PROT 22 database for other sequences that are members of the given protein family, or contain the given domain. The HMM produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate three-dimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appears to have a slight advantage over PROFILESEARCH in terms of lower rates of false negatives and false positives, even though the HMM is trained using only unaligned sequences, whereas PROFILESEARCH requires aligned training sequences. Our results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling. This region has been suggested to contain the functional domains that are typical or essential for all L-type calcium channels regardless of whether they couple to ryanodine receptors, conduct ions or both.
隐马尔可夫模型(HMMs)被应用于统计建模、数据库搜索以及蛋白质家族和蛋白质结构域的多序列比对问题。这些方法在珠蛋白家族、蛋白激酶催化结构域和EF手型钙结合基序上得到了验证。在每种情况下,HMM的参数都是从一组未比对序列的训练集中估计出来的。构建好HMM后,它被用于获得所有训练序列的多序列比对。它还被用于在SWISS-PROT 22数据库中搜索属于给定蛋白质家族或包含给定结构域的其他序列。HMM产生的多序列比对质量很高,与结合了三维结构信息的程序所产生的比对结果非常吻合。当用于鉴别测试时(通过检查数据库中的序列与珠蛋白、激酶和EF手型HMM的匹配程度),HMM能够高度准确地区分这些家族的成员和非成员。在这些测试中,HMM和PROFILESEARCH(一种用于搜索蛋白质序列与多序列比对之间关系的技术)都比PROSITE(一个蛋白质位点和模式的字典)表现得更好。尽管HMM仅使用未比对序列进行训练,而PROFILESEARCH需要比对后的训练序列,但在假阴性和假阳性率较低方面,HMM似乎比PROFILESEARCH略有优势。我们的结果表明,在L型钙通道α-1亚基中一个高度保守且在进化上保留的155个残基的假定细胞内区域存在EF手型钙结合基序,该区域在兴奋-收缩偶联中起重要作用。有人提出,无论这些L型钙通道是否与兰尼碱受体偶联、传导离子或两者兼具,该区域都包含所有L型钙通道典型或必需的功能结构域。