Johannes Gutenberg-University of Mainz , 55128 Mainz, Germany.
J Chem Inf Model. 2011 Nov 28;51(11):3017-25. doi: 10.1021/ci200278w. Epub 2011 Oct 26.
The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. For the particular case of kink prediction, we generated a data set of 132 membrane proteins containing 1014 manually labeled helices and examined the environment of kinks. Our sequence analysis confirms the great relevance of proline and reveals disproportionately high occurrences of glycine and serine at kink positions. The structural analysis shows significantly different solvent accessible surface area mean values for kinked and nonkinked helices. More important, we used this data set to validate string kernels for support vector machines as a new kink prediction method. Applying the new predictor, about 80% of all helices could be correctly predicted as kinked or nonkinked even when focusing on small helical fragments. The results exceed recently reported accuracies of alternative approaches and are a consequence of both the method and the data set.
扭曲偏离最佳α-螺旋几何形状的原因尚不清楚,但它们对蛋白质结构变化的影响是显著的。因此,它们的预测是结构生物信息学中的一个关键问题。对于扭结预测这一特殊情况,我们生成了一个包含 1014 个手动标记螺旋的 132 种膜蛋白数据集,并检查了扭结的环境。我们的序列分析证实了脯氨酸的重要相关性,并揭示了扭结位置甘氨酸和丝氨酸的异常高出现率。结构分析表明,扭结和非扭结螺旋的溶剂可及表面积平均值有显著差异。更重要的是,我们使用这个数据集来验证支持向量机的字符串核作为一种新的扭结预测方法。应用新的预测器,即使只关注小的螺旋片段,也可以正确预测大约 80%的所有螺旋是否扭结或非扭结。这些结果超过了最近报道的替代方法的准确性,是方法和数据集共同作用的结果。