Mehta P K, Heringa J, Argos P
European Molecular Biology Laboratory, Heidelberg, Germany.
Protein Sci. 1995 Dec;4(12):2517-25. doi: 10.1002/pro.5560041208.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.
为了改进蛋白质序列中的二级结构预测,人们利用了存在于结构相关但经过替换的蛋白质的多序列比对中的信息。一个由70个蛋白质家族和总共2500个序列组成的数据库被用于分别计算α螺旋、β链和卷曲子结构内的残基交换权重矩阵,其中一些序列是通过三级结构叠加进行比对的。二级结构预测是基于多序列比对局部区域中观察到的残基替换以及三种矩阵类型中每种类型可能的最大相关交换权重进行的。在逐个残基的基础上对观察到的和预测的二级结构进行比较,得到的平均准确率为72.2%。α螺旋、β链和卷曲的单个状态分别以66.7%和75.8%的正确率被预测,代表了一个平衡良好的三状态预测。通过对所有蛋白质家族进行留一法交叉验证所验证的准确率水平平均仅降至70.9%,这表明了预测过程的严格性。基于稳健性、概念清晰度、准确性和可执行效率,该方法具有相当大的优势,特别是它仅依赖于结构相关蛋白质内的氨基酸替换。