Prasad J C, Comeau S R, Vajda S, Camacho C J
Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
Bioinformatics. 2003 Sep 1;19(13):1682-91. doi: 10.1093/bioinformatics/btg211.
Even the best sequence alignment methods frequently fail to correctly identify the framework regions for which backbones can be copied from the template into the target structure. Since the underprediction and, more significantly, the overprediction of these regions reduces the quality of the final model, it is of prime importance to attain as much as possible of the true structural alignment between target and template.
We have developed an algorithm called Consensus that consistently provides a high quality alignment for comparative modeling. The method follows from a benchmark analysis of the 3D models generated by ten alignment techniques for a set of 79 homologous protein structure pairs. For 20-to-40% of the targets, these methods yield models with at least 6 A root mean square deviation (RMSD) from the native structure. We have selected the top five performing methods, and developed a consensus algorithm to generate an improved alignment. By building on the individual strength of each method, a set of criteria was implemented to remove the alignment segments that are likely to correspond to structurally dissimilar regions. The automated algorithm was validated on a different set of 48 protein pairs, resulting in 2.2 A average RMSD for the predicted models, and only four cases in which the RMSD exceeded 3 A. The average length of the alignments was about 75% of that found by standard structural superposition methods. The performance of Consensus was consistent from 2 to 32% target-template sequence identity, and hence it can be used for accurate prediction of framework regions in homology modeling.
即使是最好的序列比对方法也常常无法正确识别出可将主链从模板复制到目标结构中的框架区域。由于这些区域的预测不足,更重要的是预测过度,会降低最终模型的质量,因此尽可能实现目标与模板之间真实的结构比对至关重要。
我们开发了一种名为Consensus的算法,该算法始终能为比较建模提供高质量的比对。该方法源自对一组79个同源蛋白质结构对的十种比对技术生成的三维模型的基准分析。对于20%至40%的目标,这些方法生成的模型与天然结构的均方根偏差(RMSD)至少为6埃。我们选择了表现最佳的五种方法,并开发了一种共识算法以生成改进的比对。通过利用每种方法的个体优势,实施了一组标准来去除可能对应于结构不同区域的比对片段。该自动算法在另一组48个蛋白质对中得到验证,预测模型的平均RMSD为2.2埃,只有四个案例的RMSD超过3埃。比对的平均长度约为标准结构叠加方法所得长度的75%。Consensus的性能在目标-模板序列同一性为2%至32%时保持一致,因此可用于同源建模中框架区域的准确预测。