Krigbaum W R, Knutton S P
Proc Natl Acad Sci U S A. 1973 Oct;70(10):2809-13. doi: 10.1073/pnas.70.10.2809.
Multiple regression is used to obtain relationships for predicting the amount of secondary structure in a protein molecule from a knowledge of its aminoacid composition. We tested these relations using 18 proteins of known structure, but omitting the protein to be predicted. Independent predictions were made for the two subchains of hemoglobin and insulin. The average errors for these 20 chains or subchains are: helix +/- 7.1%, beta-sheet +/- 6.9%, turn +/- 4.2%, and coil +/- 5.7%. A second set of relations yielding somewhat inferior predictions is given for the case in which Asp and Asn, and Glu and Gln, are not differentiated. Predictions are also listed for 15 proteins for which the aminoacid sequence or tertiary structure is unknown.
多元回归用于从蛋白质分子的氨基酸组成知识中获取预测其二级结构数量的关系。我们使用18种已知结构的蛋白质测试了这些关系,但不包括要预测的蛋白质。对血红蛋白和胰岛素的两条亚链进行了独立预测。这20条链或亚链的平均误差为:螺旋±7.1%,β-折叠±6.9%,转角±4.2%,无规卷曲±5.7%。对于不区分天冬氨酸和天冬酰胺、谷氨酸和谷氨酰胺的情况,给出了一组预测效果稍差的关系。还列出了15种氨基酸序列或三级结构未知的蛋白质的预测结果。