通过组件耦合方法预测蛋白质结构类别的效果如何？

How good is prediction of protein structural class by the component-coupled method?

作者信息

Wang Z X, Yuan Z

机构信息

National Laboratory of Biomacromolecules, Institute of Biophysics, Academia Sinica, Beijing, Peoples Republic of China.

出版信息

Proteins. 2000 Feb 1;38(2):165-75. doi: 10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v.

DOI:10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v

PMID:10656263

Abstract

Proteins of known structures are usually classified into four structural classes: all-alpha, all-beta, alpha+beta, and alpha/beta type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins.

摘要

已知结构的蛋白质通常分为四种结构类型

全α型、全β型、α+β型和α/β型蛋白质。在过去几年中，已经开发出了许多基于氨基酸组成预测蛋白质结构类型的方法。最近，一种成分耦合方法被开发用于根据氨基酸组成预测蛋白质结构类型。该方法基于最小马氏距离原理，与先前的方法相比，产生了更好的预测结果。然而，不同研究者报告的结构类型预测成功率相互矛盾。该方法报告的最高准确率接近100%，但最低的只有约60%。本研究的目的是解决这一矛盾，并确定结构类型预测率的可能上限。本文基于正态性假设和最小误差的贝叶斯决策规则，提出了一种根据氨基酸组成预测蛋白质结构类型的新方法。详细的理论分析表明，如果四种蛋白质折叠类型受正态分布支配，那么从统计学意义上讲，本方法将产生最优的预测结果。使用一个包含1189个蛋白质结构域的非冗余数据集来评估新方法的性能。我们的结果表明，对于一个未知的查询蛋白质，仅根据氨基酸组成进行4类预测的上限是60%的正确率。先前研究中获得的明显相对较高的准确率水平（超过90%）是由于测试集的预选，而这些测试集可能不能充分代表所有不相关的蛋白质。

相似文献

How good is prediction of protein structural class by the component-coupled method?通过组件耦合方法预测蛋白质结构类别的效果如何？

Proteins. 2000 Feb 1;38(2):165-75. doi: 10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v.

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.仅根据氨基酸组成预测蛋白质的二级结构含量。II. 二级结构类别的悖论。

Proteins. 1996 Jun;25(2):169-79. doi: 10.1002/(SICI)1097-0134(199606)25:2<169::AID-PROT3>3.0.CO;2-D.

Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.仅根据氨基酸组成预测蛋白质的二级结构含量。I. 新的分析向量分解方法。

Proteins. 1996 Jun;25(2):157-68. doi: 10.1002/(SICI)1097-0134(199606)25:2<157::AID-PROT2>3.0.CO;2-F.

Prediction of protein (domain) structural classes based on amino-acid index.基于氨基酸指数预测蛋白质（结构域）的结构类别。

Eur J Biochem. 1999 Dec;266(3):1043-9. doi: 10.1046/j.1432-1327.1999.00947.x.

Is it a paradox or misinterpretation?这是一个悖论还是误解？

Proteins. 2001 May 15;43(3):336-8. doi: 10.1002/prot.1045.

Prediction of protein structural classes.蛋白质结构类别的预测。

Crit Rev Biochem Mol Biol. 1995;30(4):275-349. doi: 10.3109/10409239509083488.

Prediction of protein structural classes by a new measure of information discrepancy.

Comput Biol Chem. 2003 Jul;27(3):373-80. doi: 10.1016/s1476-9271(02)00087-7.

A weighting method for predicting protein structural class from amino acid composition.

Eur J Biochem. 1992 Dec 15;210(3):747-9. doi: 10.1111/j.1432-1033.1992.tb17476.x.

Prediction of protein folding class from amino acid composition.基于氨基酸组成预测蛋白质折叠类别。

Proteins. 1993 May;16(1):79-91. doi: 10.1002/prot.340160109.

Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition.通过将预测的二级结构信息纳入周的伪氨基酸组成的通用形式，准确预测蛋白质结构类别。

J Theor Biol. 2014 Mar 7;344:12-8. doi: 10.1016/j.jtbi.2013.11.021. Epub 2013 Dec 6.

引用本文的文献

Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.基于二维小波去噪和融合的不同特征表达预测蛋白质结构类别。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5.

Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage.通过分析蛋白质酶解后的片段质量分布对其结构、定位和功能特性进行统计预测。

Sci Rep. 2016 Feb 29;6:22286. doi: 10.1038/srep22286.

Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.基于一致序列和分段位置特异性得分矩阵预测低相似性序列的蛋白质结构类别

Comput Math Methods Med. 2015;2015:370756. doi: 10.1155/2015/370756. Epub 2015 Dec 15.

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.基于带间隙二肽和递归特征选择方法的蛋白质结构类预测

Int J Mol Sci. 2015 Dec 24;17(1):15. doi: 10.3390/ijms17010015.

Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information.利用基于进化和基于结构的信息融合来改进蛋白质折叠识别。

BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S12. doi: 10.1186/1471-2105-15-S16-S12. Epub 2014 Dec 8.

PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.PSSP-RFE：通过从PSI-BLAST序列谱、物理化学性质和功能注释中进行递归特征提取来准确预测蛋白质结构类别。

PLoS One. 2014 Mar 27;9(3):e92863. doi: 10.1371/journal.pone.0092863. eCollection 2014.

Proposing a highly accurate protein structural class predictor using segmentation-based features.提出一种基于分段特征的高精度蛋白质结构类预测器。

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-15-S1-S2. Epub 2014 Jan 24.

A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition.氨基酸理化属性选择用于蛋白质折叠识别的策略。

BMC Bioinformatics. 2013 Jul 24;14:233. doi: 10.1186/1471-2105-14-233.

Accurate prediction of protein structural class.准确预测蛋白质结构类别。

PLoS One. 2012;7(6):e37653. doi: 10.1371/journal.pone.0037653. Epub 2012 Jun 19.

Fold homology detection using sequence fragment composition profiles of proteins.使用蛋白质序列片段组成特征来检测折叠同源性。

Proteins. 2010 Oct;78(13):2745-56. doi: 10.1002/prot.22788.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过组件耦合方法预测蛋白质结构类别的效果如何？

How good is prediction of protein structural class by the component-coupled method?

作者信息

机构信息

出版信息

已知结构的蛋白质通常分为四种结构类型

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献