Giaquinta Emanuele, Pozzi Laura
Department of Computer Science, University of Helsinki, Helsinki, Finland.
J Comput Biol. 2013 Aug;20(8):593-609. doi: 10.1089/cmb.2012.0266.
Protein Structure Prediction (PSP) is the problem of predicting the three-dimensional native structure of a protein given its primary structure, i.e., the corresponding sequence of amino acids. Different approaches have been proposed to model this problem, and this research explores the prediction of optimal structures using the well studied simplified lattice Hydrophobic and Polar (HP) model--in particular, on the 2D square lattice. We present a twofold result. First, we devise a new upper bound for the number of contacts achievable by an HP sequence, and show that it is in several cases more stringent than the upper bound previously known in literature. Then, we present an innovative algorithm that outperforms the state of the art in exact approaches for the prediction of optimal structures in lattice protein model, for 2D square lattices. The algorithm, called minwalk and based on a heavily pruned exhaustive search, also outperforms the state of the art in non-exact approaches in several cases. Due to this algorithm, it is now possible to prove optimal results in the square 2D lattice, for standard HP sequences of size up to 80 elements, which were only best-known-results previously. Furthermore, we provide the degeneracy (i.e. all optimal solutions) of such benchmark sequences, which was unknown in literature. These results can be a useful tool to foster advances in further research.
蛋白质结构预测(PSP)是在已知蛋白质一级结构(即相应的氨基酸序列)的情况下预测其三维天然结构的问题。人们已经提出了不同的方法来对这个问题进行建模,本研究探索使用经过充分研究的简化晶格疏水和极性(HP)模型来预测最优结构——特别是在二维正方形晶格上。我们给出了两方面的结果。首先,我们为HP序列可实现的接触数设计了一个新的上限,并表明在几种情况下它比文献中先前已知的上限更为严格。然后,我们提出了一种创新算法,在晶格蛋白质模型中二维正方形晶格的最优结构预测的精确方法中,该算法优于现有技术。该算法名为minwalk,基于大量剪枝的穷举搜索,在几种情况下也优于非精确方法中的现有技术。由于该算法,现在有可能证明对于长度达80个元素的标准HP序列在二维正方形晶格中的最优结果,而这些结果之前只是最知名的结果。此外,我们提供了此类基准序列的简并性(即所有最优解),这在文献中是未知的。这些结果可以成为推动进一步研究进展的有用工具。