Suppr超能文献

利用预测的形状字符串提高 γ-转角预测的准确性。

Using predicted shape string to enhance the accuracy of γ-turn prediction.

机构信息

Department of Chemistry, Tongji University, Room 438, No.1239, Siping Road, Shanghai, 200092, People's Republic of China.

出版信息

Amino Acids. 2012 May;42(5):1749-55. doi: 10.1007/s00726-011-0889-z. Epub 2011 Mar 22.

Abstract

Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC)≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Qtotal can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ freely.

摘要

已经开发出许多预测蛋白质 γ-转角的方法。然而,它们通常提供的结果并不是很好,马修斯相关系数 (MCC)≤0.18。在这里,我们尝试开发一种方法来提高 γ-转角预测的准确性。首先,我们采用几何平均值度量作为最优标准来评估支持向量机对高度不平衡的 γ-转角数据集的性能。该度量试图在保持两者平衡的同时最大化敏感性和特异性。其次,设计了一种通过结构比对蛋白质结构数据库生成蛋白质形状字符串的预测器,并将预测的形状字符串引入到 γ-转角预测中作为新的变量。基于这一认识,我们开发了一种新的 γ-转角预测方法。通过使用五重交叉验证技术对 320 个非同源蛋白质链的基准数据集进行训练和测试,该方法表现出优异的性能。总体预测准确性 Qtotal 可达 92.2%,MCC 为 0.38,优于现有的 γ-转角预测方法。我们的结果表明,蛋白质形状字符串对于预测蛋白质紧密转角非常有用,并且使用二面角信息作为机器学习的变量来预测蛋白质折叠是合理的。这项工作中使用的数据集和从结构数据库生成预测形状字符串的软件可以从匿名 ftp 站点 ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ 免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验