蛋白质结构预测开始得很好，但结局却很糟糕。

Protein structure prediction begins well but ends badly.

机构信息

Department of Statistics, University of Oxford, Oxford, OX13TG, England.

出版信息

Proteins. 2010 Apr;78(5):1282-90. doi: 10.1002/prot.22646.

PMID:20014025

Abstract

The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up-to-date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)-terminus when compared with analogous regions near the carboxy(C)-terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N-terminus. CASP data shows a similar bias, with N-terminal fragments being predicted more accurately than fragments from the C-terminus. Two analogous fragments are taken from each model, the N-terminal fragment begins at the start of the most N-terminal secondary structure element (SSE), whereas the C-terminal fragment finishes at the end of the most C-terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N-terminal fragment is predicted with greater accuracy in 79% of cases.

摘要

准确预测蛋白质的二级和三级结构仍然是一个悬而未决的问题。多年来，已经实施和评估了许多方法。大多数预测算法都是从整个氨基酸序列开始的，并且以独立于序列位置的相同方式对待所有残基。在这里，我们分析了盲测数据，以研究预测能力是否沿着链变化。我们评估了来自蛋白质结构预测技术的关键评估（CASP）实验的自由建模结果；评估了最新的 EVA 数据，这是对二级结构预测服务器的全自动盲测。结果表明，结构预测的准确性取决于序列位置。与类似的 C 末端区域相比，在 N 末端附近的二级结构和三级结构的预测更加准确。通过 EVA 评估的 10 种二级结构预测算法中有 8 种在 N 末端区域的表现明显更好。CASP 数据显示出类似的偏差，N 末端片段的预测比 C 末端片段的预测更准确。从每个模型中取出两个类似的片段，N 末端片段从最 N 末端的二级结构元素（SSE）的开始处开始，而 C 末端片段从最 C 末端 SSE 的末端结束。每个片段都在局部与各自的天然片段叠加。相对末端预测准确性（RMSD）在模型内的基础上进行计算。在 20 个残基的片段长度下，N 末端片段的预测准确率在 79%的情况下更高。