Helles Glennie
University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.
J R Soc Interface. 2008 Apr 6;5(21):387-96. doi: 10.1098/rsif.2007.1278.
Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings--protein representation and fragment assembly--were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.
蛋白质结构预测是当今生物信息学中的主要挑战之一。在过去的五十年里,人们尝试了许多不同的算法方法,尽管取得了一些进展,但即使对于许多小蛋白质来说,这个问题仍然无法解决。虽然总体目标是从一级序列预测三维结构,但我们目前的知识和计算能力根本不足以解决如此高复杂性的问题。然而,一些预测算法似乎比其他算法表现得更好,尽管并不总是清楚哪些算法是这样,而且为什么会这样可能更不明显。在这篇综述中,比较了最近发表的18种不同预测算法的报告性能结果。此外,确定了最有可能导致报告性能差异的一般算法设置,并比较了18种预测算法中每种算法的具体设置。报告的平均归一化均方根偏差分数范围为11.17至3.48。通过一种包括均方根偏差分数和CPU时间的性能度量,确定目前性能最佳的预测算法是I-TASSER算法。发现其中两个算法设置——蛋白质表示和片段组装——分别对运行时间和预测结构有明确的积极影响。因此,将这些知识纳入新的预测算法设计中似乎有明显的好处。