Roberts Kyle E, Gainza Pablo, Hallen Mark A, Donald Bruce R
Department of Computer Science, Duke University, Durham, NC.
Department of Biochemistry, Duke University Medical Center, Durham, NC.
Proteins. 2015 Oct;83(10):1859-1877. doi: 10.1002/prot.24870. Epub 2015 Aug 24.
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
尽管近年来基于结构的计算蛋白质设计取得了显著成功,但蛋白质设计算法仍需改进,以提高新设计的生物学准确性。蛋白质设计算法会在指数数量的蛋白质构象、蛋白质集合和氨基酸序列中进行搜索,试图找到具有所需生物学功能的全局最优结构。为了提高蛋白质设计的生物学准确性,有必要在搜索过程中增加允许的蛋白质灵活性数量以及设计的整体规模,同时确保找到能量最低的结构和序列。基于DEE/A的算法是蛋白质设计领域中最普遍的可证明算法,并且可以可证明地枚举低能量蛋白质构象的无间隙列表,这对于预测蛋白质结合的基于集合的算法来说是必要的。我们提出了两类对A算法的算法改进,极大地提高了A的效率。首先,我们分析了在A树中对可变残基位置扩展进行排序的影响,并提出了一种动态残基排序方法,该方法减少了搜索过程中必须访问的A节点数量。其次,我们提出了新的方法来改进用于在A搜索期间估计部分构象能量的构象边界。残基排序技术和改进的边界可以结合起来进一步提高A的效率。我们的改进使所有基于A的方法能够更全面地搜索蛋白质构象空间,这最终将提高复杂的生物医学相关设计的准确性。