Suppr超能文献

通过组件耦合方法预测蛋白质结构类别的效果如何?

How good is prediction of protein structural class by the component-coupled method?

作者信息

Wang Z X, Yuan Z

机构信息

National Laboratory of Biomacromolecules, Institute of Biophysics, Academia Sinica, Beijing, Peoples Republic of China.

出版信息

Proteins. 2000 Feb 1;38(2):165-75. doi: 10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v.

Abstract

Proteins of known structures are usually classified into four structural classes: all-alpha, all-beta, alpha+beta, and alpha/beta type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins.

摘要

已知结构的蛋白质通常分为四种结构类型

全α型、全β型、α+β型和α/β型蛋白质。在过去几年中,已经开发出了许多基于氨基酸组成预测蛋白质结构类型的方法。最近,一种成分耦合方法被开发用于根据氨基酸组成预测蛋白质结构类型。该方法基于最小马氏距离原理,与先前的方法相比,产生了更好的预测结果。然而,不同研究者报告的结构类型预测成功率相互矛盾。该方法报告的最高准确率接近100%,但最低的只有约60%。本研究的目的是解决这一矛盾,并确定结构类型预测率的可能上限。本文基于正态性假设和最小误差的贝叶斯决策规则,提出了一种根据氨基酸组成预测蛋白质结构类型的新方法。详细的理论分析表明,如果四种蛋白质折叠类型受正态分布支配,那么从统计学意义上讲,本方法将产生最优的预测结果。使用一个包含1189个蛋白质结构域的非冗余数据集来评估新方法的性能。我们的结果表明,对于一个未知的查询蛋白质,仅根据氨基酸组成进行4类预测的上限是60%的正确率。先前研究中获得的明显相对较高的准确率水平(超过90%)是由于测试集的预选,而这些测试集可能不能充分代表所有不相关的蛋白质。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验