Suppr超能文献

蛋白质中单个序列模式的预测能力与整体预测准确性之间的弱相关性。

Weak correlation between predictive power of individual sequence patterns and overall prediction accuracy in proteins.

作者信息

Rooman M J, Wodak S J

机构信息

Unité de Conformation des Macromolécules Biologiques, Université Libre de Bruxelles, Belgium.

出版信息

Proteins. 1991;9(1):69-78. doi: 10.1002/prot.340090108.

Abstract

Patterns in amino acid properties (polar, hydrophobic, etc.) that characterize secondary structure motifs are derived from a database containing 75 protein structures, with the aim of circumventing the limitations due to data base size so as to increase structure prediction score. Many such sequence-structure associations with high intrinsic predictive power are found, which turn out to be correct 78% of the time when applied individually to proteins outside the learning set. Based on these associations, a prediction method is developed, which reaches the score of 62% on the 3 states alpha-helix, beta-strand, and loop, without using additional constraints. Though this score is quite good compared to that of other available prediction methods, it is much lower than could be expected from the high intrinsic predictive power of the associations used. The reasons underlying this surprising result, which indicate that prediction score and intrinsic predictive power are only weakly coupled, are discussed. It is also shown that the size of the present database still seriously limits prediction scores, even when property patterns are used, and that higher scores are expected in large databases. Clues are provided on the relative influence of neglecting spatial interactions on prediction efficiency, suggesting that, in sufficiently large databases, predicted secondary structures would correspond to those formed early in the folding process. This hypothesis is tested by confronting present predictions with available experimental data on early protein folding intermediates and on small peptides that adopt a relatively stable conformation in water. Although admittedly there are still too few such data, results suggest that the hypothesis might be well founded.

摘要

表征二级结构基序的氨基酸特性(极性、疏水性等)模式源自一个包含75个蛋白质结构的数据库,目的是规避由于数据库大小带来的限制,从而提高结构预测得分。发现了许多具有高内在预测能力的此类序列-结构关联,当单独应用于学习集之外的蛋白质时,结果表明这些关联在78%的情况下是正确的。基于这些关联,开发了一种预测方法,该方法在α-螺旋、β-链和环这三种状态下的预测得分达到了62%,且未使用额外的约束条件。尽管与其他可用的预测方法相比,这个得分相当不错,但远低于从所使用关联的高内在预测能力所预期的得分。文中讨论了这一惊人结果背后的原因,这些原因表明预测得分与内在预测能力之间的耦合较弱。还表明,即使使用了特性模式,当前数据库的大小仍然严重限制了预测得分,预计在大型数据库中得分会更高。文中提供了关于忽略空间相互作用对预测效率的相对影响的线索,表明在足够大的数据库中,预测的二级结构将与折叠过程早期形成的结构相对应。通过将当前预测与关于早期蛋白质折叠中间体以及在水中采用相对稳定构象的小肽的现有实验数据进行对比,对这一假设进行了检验。尽管不可否认此类数据仍然太少,但结果表明该假设可能有充分的依据。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验