Wu Sitao, Skolnick Jeffrey, Zhang Yang
Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA.
BMC Biol. 2007 May 8;5:17. doi: 10.1186/1741-7007-5-17.
Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.
We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Calpha-root mean square deviation (RMSD) of 3.8A, with 6 of them having a Calpha-RMSD < 2.5A. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Calpha-RMSD < 2.5A. The average Calpha-RMSD of the I-TASSER models was 3.9A, whereas it was 5.9A using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Calpha-RMSD of 3.9A was obtained for the third benchmark, with seven cases having a Calpha-RMSD < 2.5A.
Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.
从氨基酸序列预测三维蛋白质结构是计算结构生物学中一个重要的未解决问题。如果已解析出相近的同源蛋白质,那么该问题相对会变得容易一些,因为通过将目标序列与已解析的同源结构进行比对,可以构建高分辨率模型。然而,对于蛋白质数据库(PDB)库中没有相似折叠结构的序列,模型必须从头开始预测。从头开始的结构建模进展缓慢。本研究的目的是扩展TASSER(穿线法/组装/优化)方法用于从头建模,并系统地检验其折叠小单结构域蛋白质的能力。
我们通过迭代实施TASSER方法开发了I-TASSER,该方法用于三个小蛋白质基准测试的折叠测试。首先,使用16个小蛋白质(<90个残基)的数据生成I-TASSER模型,其平均Cα-均方根偏差(RMSD)为3.8Å,其中6个的Cα-RMSD<2.5Å。总体结果与全原子ROSETTA模拟相当,但I-TASSER的中央处理器(CPU)时间要短得多(150个CPU日对5个CPU小时)。其次,使用20个小蛋白质(<120个残基)的数据。I-TASSER将其中4个折叠为Cα-RMSD<2.5Å。I-TASSER模型的平均Cα-RMSD为3.9Å,而使用TOUCHSTONE-II软件时为5.9Å。最后,从PDB库中选取20个非同源小蛋白质(<120个残基)。第三个基准测试获得的平均Cα-RMSD为3.9Å,其中7个案例的Cα-RMSD<2.5Å。
我们的模拟结果表明,I-TASSER能够一致地预测小单结构域蛋白质的正确折叠,有时还能预测出高分辨率模型。与其他从头建模方法如ROSETTA和TOUCHSTONE II相比,I-TASSER的平均性能要么好得多,要么在更短的计算时间内与之相似。这些数据,连同自动化I-TASSER服务器(Zhang-Server)在最近的蛋白质结构预测关键评估(CASP)7实验的“自由建模”部分中的显著表现,证明了在自动化从头模型生成方面取得的新进展。I-TASSER服务器可供学术用户免费使用,网址为http://zhang.bioinformatics.ku.edu/I-TASSER。