IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2119-2130. doi: 10.1109/TCBB.2019.2917452. Epub 2020 Dec 8.
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
从头蛋白质结构预测可以被视为在能量函数的指导下进行的构象空间优化问题。然而,如何设计一个准确的能量函数以确保接近天然结构的低能构象是一个挑战。幸运的是,最近的研究表明,通过整合残基-残基距离信息,可以显著提高从头蛋白质结构预测的准确性。本文在进化算法框架内提出了一种基于两阶段距离特征优化算法(TDFO)的从头蛋白质结构预测方法。在 TDFO 中,首先通过二分 K-均值算法从距离分布中提取特征信息来设计相似性模型。然后,开发了基于相似性模型的选择策略来指导构象搜索,从而提高预测模型的质量。此外,还设计了全局和局部突变策略,并提出了一种状态估计策略,以在搜索空间的探索和利用之间取得平衡。对 35 个基准蛋白质的实验结果表明,所提出的 TDFO 可以提高大部分测试蛋白质的预测准确性。