Kawabata T, Doi J
Department of Biotechnology, University of Tokyo, Japan.
Proteins. 1997 Jan;27(1):36-46. doi: 10.1002/(sici)1097-0134(199701)27:1<36::aid-prot5>3.0.co;2-l.
We propose a binary word encoding to improve the protein secondary structure prediction. A binary word encoding encodes a local amino acid sequence to a binary word, which consists of 0 or 1. We use an encoding function to map an amino acid to 0 or 1. Using the binary word encoding, we can statistically extract the multiresidue information, which depends on more than one residue. We combine the binary word encoding with the GOR method, its modified version, which shows better accuracy, and the neural network method. The binary word encoding improves the accuracy of GOR by 2.8%. We obtain similar improvement when we combine this with the modified GOR method and the neural network method. When we use multiple sequence alignment data, the binary word encoding similarly improves the accuracy. The accuracy of our best combined method is 68.2%. In this paper, we only show improvement of the GOR and neural network method, we cannot say that the encoding improves the other methods. But the improvement by the encoding suggests that the multiresidue interaction affects the formation of secondary structure. In addition, we find that the optimal encoding function obtained by the simulated annealing method relates to nonpolarity. This means that nonpolarity is important to the multiresidue interaction.
我们提出一种二进制词编码方法来改进蛋白质二级结构预测。二进制词编码将局部氨基酸序列编码为一个由0或1组成的二进制词。我们使用一种编码函数将氨基酸映射为0或1。通过二进制词编码,我们可以统计地提取取决于多个残基的多残基信息。我们将二进制词编码与GOR方法、其改进版本(显示出更高的准确性)以及神经网络方法相结合。二进制词编码使GOR方法的准确性提高了2.8%。当我们将其与改进的GOR方法和神经网络方法相结合时,也获得了类似的提升。当我们使用多序列比对数据时,二进制词编码同样提高了准确性。我们最佳组合方法的准确率为68.2%。在本文中,我们仅展示了GOR方法和神经网络方法的改进,不能说这种编码改进了其他方法。但编码带来的改进表明多残基相互作用会影响二级结构的形成。此外,我们发现通过模拟退火方法获得的最优编码函数与非极性有关。这意味着非极性对多残基相互作用很重要。