Lee Seung Yup, Skolnick Jeffrey
Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA.
Proteins. 2007 Jul 1;68(1):39-47. doi: 10.1002/prot.21440.
To improve the accuracy of TASSER models especially in the limit where threading provided template alignments are of poor quality, we have developed the TASSER(iter) algorithm which uses the templates and contact restraints from TASSER generated models for iterative structure refinement. We apply TASSER(iter) to a large benchmark set of 2,773 nonhomologous single domain proteins that are < or = 200 in length and that cover the PDB at the level of 35% pairwise sequence identity. Overall, TASSER(iter) models have a smaller global average RMSD of 5.48 A compared to 5.81 A RMSD of the original TASSER models. Classifying the targets by the level of prediction difficulty (where Easy targets have a good template with a corresponding good threading alignment, Medium targets have a good template but a poor alignment, and Hard targets have an incorrectly identified template), TASSER(iter) (TASSER) models have an average RMSD of 4.15 A (4.35 A) for the Easy set and 9.05 A (9.52 A) for the Hard set. The largest reduction of average RMSD is for the Medium set where the TASSER(iter) models have an average global RMSD of 5.67 A compared to 6.72 A of the TASSER models. Seventy percent of the Medium set TASSER(iter) models have a smaller RMSD than the TASSER models, while 63% of the Easy and 60% of the Hard TASSER models are improved by TASSER(iter). For the foldable cases, where the targets have a RMSD to the native <6.5 A, TASSER(iter) shows obvious improvement over TASSER models: For the Medium set, it improves the success rate from 57.0 to 67.2%, followed by the Hard targets where the success rate improves from 32.0 to 34.8%, with the smallest improvement in the Easy targets from 82.6 to 84.0%. These results suggest that TASSER(iter) can provide more reliable predictions for targets of Medium difficulty, a range that had resisted improvement in the quality of protein structure predictions.
为了提高TASSER模型的准确性,尤其是在穿线法提供的模板比对质量较差的情况下,我们开发了TASSER(iter)算法,该算法使用TASSER生成模型中的模板和接触约束进行迭代结构优化。我们将TASSER(iter)应用于一个由2773个非同源单结构域蛋白组成的大型基准集,这些蛋白长度小于或等于200,并且在35%的成对序列同一性水平上覆盖了蛋白质数据库(PDB)。总体而言,TASSER(iter)模型的全局平均均方根偏差(RMSD)较小,为5.48 Å,而原始TASSER模型的RMSD为5.81 Å。按照预测难度水平对目标进行分类(其中简单目标有一个良好的模板及相应良好的穿线比对,中等目标有一个良好的模板但比对较差,困难目标有一个错误识别的模板),TASSER(iter)(TASSER)模型对于简单组的平均RMSD为4.15 Å(4.35 Å),对于困难组为9.05 Å(9.52 Å)。平均RMSD降低最大的是中等组,其中TASSER(iter)模型的平均全局RMSD为5.67 Å,而TASSER模型为6.72 Å。中等组中70%的TASSER(iter)模型的RMSD比TASSER模型小,而简单组中63%以及困难组中60%的TASSER模型通过TASSER(iter)得到了改进。对于可折叠的情况,即目标与天然结构的RMSD小于6.5 Å,TASSER(iter)相对于TASSER模型有明显改进:对于中等组,成功率从57.0%提高到67.2%,其次是困难目标,成功率从32.0%提高到34.8%,简单目标的改进最小,从82.6%提高到84.0%。这些结果表明,TASSER(iter)可以为中等难度的目标提供更可靠的预测,而中等难度的目标在蛋白质结构预测质量方面一直难以得到改进。