Dolan Michael A, Keil Matthias, Baker David S
Tripos Informatics Research Center, 1699 South Hanley Road, St. Louis, Missouri 63144, USA.
Proteins. 2008 Sep;72(4):1243-58. doi: 10.1002/prot.22022.
Although the number of known protein structures is increasing, the number of protein sequences without determined structures is still much larger. Three-dimensional (3D) protein structure information helps in the understanding of functional mechanisms, but solving structures by X-ray crystallography or NMR is often a lengthy and difficult process. A relatively fast way of determining a protein's 3D structure is to construct a computer model using homologous sequence and structure information. Much work has gone into algorithms that comprise the ORCHESTRAR homology modeling program in the SYBYL software package. This novel homology modeling tool combines algorithms for modeling conserved cores, variable regions, and side chains. The paradigm of using existing knowledge from multiple templates and the underlying protein environment knowledgebase is used in all of these algorithms, and will become even more powerful as the number of experimentally derived protein structures increases. To determine how ORCHESTRAR compares to Composer (a broadly used, but an older tool), homology models of 18 proteins were constructed using each program so that a detailed comparison of each step in the modeling process could be carried out. Proteins modeled include kinases, dihydrofolate reductase, HIV protease, and factor Xa. In almost all cases ORCHESTRAR produces models with lower root-mean-squared deviation (RMSD) values when compared with structures determined by X-ray crystallography or NMR. Moreover, ORCHESTRAR produced a homology model for three target sequences where Composer failed to produce any. Data for RMSD comparisons between structurally conserved cores, structurally variable regions, side-chain conformations are presented, as well as analyses of active site and protein-protein interface configurations.
尽管已知蛋白质结构的数量在不断增加,但尚未确定结构的蛋白质序列数量仍然多得多。三维(3D)蛋白质结构信息有助于理解功能机制,但通过X射线晶体学或核磁共振来解析结构通常是一个漫长而艰难的过程。一种相对快速确定蛋白质3D结构的方法是利用同源序列和结构信息构建计算机模型。SYBYL软件包中的ORCHESTRAR同源建模程序所包含的算法已经做了大量工作。这种新颖的同源建模工具结合了用于建模保守核心、可变区和侧链的算法。所有这些算法都采用了利用来自多个模板的现有知识以及潜在蛋白质环境知识库的范式,并且随着实验获得的蛋白质结构数量的增加,其功能将变得更加强大。为了确定ORCHESTRAR与Composer(一种广泛使用但较旧的工具)相比如何,使用每个程序构建了18种蛋白质的同源模型,以便能够对建模过程中的每个步骤进行详细比较。所建模的蛋白质包括激酶、二氢叶酸还原酶、HIV蛋白酶和凝血因子Xa。与通过X射线晶体学或核磁共振确定的结构相比,在几乎所有情况下,ORCHESTRAR生成的模型具有更低的均方根偏差(RMSD)值。此外,对于三个目标序列,ORCHESTRAR生成了同源模型,而Composer未能生成任何模型。文中给出了结构保守核心、结构可变区、侧链构象之间RMSD比较的数据,以及活性位点和蛋白质 - 蛋白质界面构型的分析。