Dolan Michael A, Noah James W, Hurt Darrell
Bioinformatics and Computational Biosciences Branch, National Institute of Allergies and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA.
Methods Mol Biol. 2012;857:399-414. doi: 10.1007/978-1-61779-588-6_18.
The number of known three-dimensional protein sequences is orders of magnitude higher than the number of known protein structures. This is a result of an increase in large-scale genomic sequencing projects, the inability of proteins to crystallize or crystals to diffract well, or a simple lack of resources. An alternative is to use one of a variety of available homology modeling programs to produce a computational model of a protein. Protein models are produced using information from known protein structures found to be similar. Here, we compare the ability of a number of popular homology modeling programs to produce quality models from user-defined target-template sequence alignments over a range of circumstances including low sequence identity, variable sequence length, and when interfaced with a protein or small molecule. Programs evaluated include Prime, SWISS-MODEL, MOE, MODELLER, ROSETTA, Composer, ORCHESTRAR, and I-TASSER. Proteins to be modeled were chosen to test a range of sequence identities, sequence lengths, and protein motifs and all are of scientific importance. These include HIV-1 protease, kinases, dihydrofolate reductase, a viral capsid protein, and factor Xa among others. For the most part, the programs produce results that are similar. For example, all programs are able to produce reasonable models when sequence identities are >30% and all programs have difficulties producing complete models when sequence identities are lower. However, certain programs fare slightly better than others in certain situations and we attempt to provide insight on this topic.
已知的三维蛋白质序列数量比已知的蛋白质结构数量高出几个数量级。这是大规模基因组测序项目增加、蛋白质无法结晶或晶体衍射效果不佳,或者仅仅是资源匮乏的结果。一种替代方法是使用各种可用的同源建模程序之一来生成蛋白质的计算模型。蛋白质模型是利用从发现的相似已知蛋白质结构中获取的信息生成的。在这里,我们比较了一些流行的同源建模程序在一系列情况下,包括低序列同一性、可变序列长度以及与蛋白质或小分子对接时,根据用户定义的目标-模板序列比对生成高质量模型的能力。评估的程序包括Prime、SWISS-MODEL、MOE、MODELLER、ROSETTA、Composer、ORCHESTRAR和I-TASSER。选择要建模的蛋白质以测试一系列序列同一性、序列长度和蛋白质基序,并且所有这些蛋白质都具有科学重要性。这些包括HIV-1蛋白酶、激酶、二氢叶酸还原酶、病毒衣壳蛋白和凝血因子Xa等。在大多数情况下,这些程序产生的结果相似。例如,当序列同一性>30%时,所有程序都能够生成合理的模型,而当序列同一性较低时,所有程序在生成完整模型方面都存在困难。然而,某些程序在某些情况下表现略优于其他程序,我们试图就此主题提供一些见解。