Nielsen H, Krogh A
Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.
Proc Int Conf Intell Syst Mol Biol. 1998;6:122-30.
A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our data, the length distributions for the three regions are significantly different from expectations. For instance, the assigned hydrophobic region is between 8 and 12 residues long in almost all eukaryotic signal peptides. This analysis also makes obvious the difference between eukaryotes, Gram-positive bacteria, and Gram-negative bacteria. The model can be used to predict the location of the cleavage site, which it finds correctly in nearly 70% of signal peptides in a cross-validated test--almost the same accuracy as the best previous method. One of the problems for existing prediction methods is the poor discrimination between signal peptides and uncleaved signal anchors, but this is substantially improved by the hidden Markov model when expanding it with a very simple signal anchor model.
已开发出一种信号肽的隐马尔可夫模型。它包含针对N端部分、疏水区域以及切割位点周围区域的子模型。对于已知的信号肽,该模型可用于确定这三个区域之间的客观边界。应用于我们的数据时,这三个区域的长度分布与预期有显著差异。例如,在几乎所有真核生物信号肽中,所确定的疏水区域长度在8至12个残基之间。该分析还凸显了真核生物、革兰氏阳性菌和革兰氏阴性菌之间的差异。该模型可用于预测切割位点的位置,在交叉验证测试中,它能在近70%的信号肽中正确找到切割位点,准确率几乎与之前最好的方法相同。现有预测方法的问题之一是信号肽与未切割的信号锚定之间区分能力较差,但当用一个非常简单的信号锚定模型对隐马尔可夫模型进行扩展时,这一问题得到了显著改善。