Fischer Daniel
Bioinformatics, Department of Computer Science, Ben Gurion University, Beer-Sheva, Israel.
Proteins. 2003 May 15;51(3):434-41. doi: 10.1002/prot.10357.
To gain a better understanding of the biological role of proteins encoded in genome sequences, knowledge of their three-dimensional (3D) structure and function is required. The computational assignment of folds is becoming an increasingly important complement to experimental structure determination. In particular, fold-recognition methods aim to predict approximate 3D models for proteins bearing no sequence similarity to any protein of known structure. However, fully automated structure-prediction methods can currently produce reliable models for only a fraction of these sequences. Using a number of semiautomated procedures, human expert predictors are often able to produce more and better predictions than automated methods. We describe a novel, fully automatic, fold-recognition meta-predictor, named 3D-SHOTGUN, which incorporates some of the strategies human predictors have successfully applied. This new method is reminiscent of the so-called cooperative algorithms of Computer Vision. The input to 3D-SHOTGUN are the top models predicted by a number of independent fold-recognition servers. The meta-predictor consists of three steps: (i) assembly of hybrid models, (ii) confidence assignment, and (iii) selection. We have applied 3D-SHOTGUN to an unbiased test set of 77 newly released protein structures sharing no sequence similarity to proteins previously released. Forty-six correct rank-1 predictions were obtained, 30 of which had scores higher than that of the first incorrect prediction-a significant improvement over the performance of all individual servers. Furthermore, the predicted hybrid models were, on average, more similar to their corresponding native structures than those produced by the individual servers. This opens the possibility of generating more accurate, full-atom homology models for proteins with no sequence similarity to proteins of known structure. These improvements represent a step forward toward the wider applicability of fully automated structure-prediction methods at genome scales.
为了更好地理解基因组序列中编码的蛋白质的生物学作用,需要了解它们的三维(3D)结构和功能。折叠的计算分配正日益成为实验结构测定的重要补充。特别是,折叠识别方法旨在为与任何已知结构的蛋白质没有序列相似性的蛋白质预测近似的3D模型。然而,目前全自动结构预测方法只能为这些序列中的一小部分生成可靠的模型。使用一些半自动程序,人类专家预测者通常能够比自动化方法做出更多、更好的预测。我们描述了一种新颖的、全自动的折叠识别元预测器,名为3D-SHOTGUN,它结合了人类预测者成功应用的一些策略。这种新方法让人想起计算机视觉中所谓的协作算法。3D-SHOTGUN的输入是由多个独立的折叠识别服务器预测的顶级模型。该元预测器由三个步骤组成:(i)混合模型的组装,(ii)置信度分配,以及(iii)选择。我们将3D-SHOTGUN应用于一个由77个新发布的蛋白质结构组成的无偏测试集,这些结构与之前发布的蛋白质没有序列相似性。获得了46个正确的排名第一的预测,其中30个的得分高于第一个错误预测的得分——这比所有单个服务器的性能有了显著提高。此外,预测的混合模型平均而言比单个服务器生成的模型与其相应的天然结构更相似。这为生成与已知结构的蛋白质没有序列相似性的蛋白质的更准确的全原子同源模型开辟了可能性。这些改进代表了朝着全自动结构预测方法在基因组规模上更广泛应用迈出的一步。