IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):489-499. doi: 10.1109/TCBB.2019.2928809. Epub 2021 Apr 8.
Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC search, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21Å between the target and the design proteins when modeled with an existing protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
蛋白质设计,也称为逆蛋白折叠问题,是指确定能够折叠成目标蛋白结构的蛋白序列。蛋白质设计已被证明是 NP 难问题。虽然研究人员正在设计启发式算法,并侧重于新的评分函数,但我们提出了一种复制交换蒙特卡罗(REMC)搜索算法,该算法使用贪婪策略确保更快的收敛。我们利用生物学见解构建了一个进化轮廓,以从目标蛋白的结构同源物中编码不同位置的氨基酸可变性。进化轮廓指导 REMC 搜索,而贪婪方法确认对序列-结构适应度曲面进行了可观的探索和利用。一旦检测到停滞状态,我们就允许终止模拟轨迹。一系列序列和结构水平的验证确立了我们设计的良好性。在基准数据集上,我们的算法报告了当使用现有蛋白质折叠软件对目标和设计蛋白质进行建模时,目标和设计蛋白质之间的平均均方根偏差为 1.21Å。此外,我们的算法保证了 6.16 倍的整体加速。在分子动力学模拟中,我们观察到,从选定的五个设计蛋白中,有四个报告了更好的与相应目标蛋白相当的稳定性。