Jou Jonathan D, Jain Swati, Georgiev Ivelin S, Donald Bruce R
1 Department of Computer Science, Duke University , Durham, North Carolina.
2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina.
J Comput Biol. 2016 Jun;23(6):413-24. doi: 10.1089/cmb.2015.0194. Epub 2016 Jan 8.
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
蛋白质设计算法经常使用忽略残基对之间长程相互作用的稀疏能量函数,以降低计算成本。当前充分利用这些能量函数产生的最优子结构的动态规划算法仅计算全局最小能量构象(GMEC)。这过度偏向于单一静态构象的序列,而忽略了具有多个低能量构象的更好的结合序列。诸如A等基于可证明的整体算法避免了这个问题,但A不能保证比穷举枚举有更好的性能。我们提出了一种新颖的、可证明的动态规划算法,称为分支宽度最小化*(BWM*),以按能量增加的顺序枚举无间隙的构象整体。对于每个残基最多有q个离散侧链构象的n残基蛋白质设计,给定分支宽度为w的分支分解,BWM在O([公式:见原文])时间内返回稀疏GMEC,并且仅在O([公式:见原文])时间内枚举每个额外的构象。我们定义了一种新的度量,总有效搜索空间(TESS),它可以在运行BWM或A之前有效地先验计算。我们在67个蛋白质设计问题上运行了BWM,发现TESS以100%的准确率区分了BWM高效和A高效的情况。正如TESS预测并经实验验证的那样,BWM在73%的情况下优于A,并且比A更快地计算完整的整体或近似值,以毫秒为单位枚举每个额外的构象。与A不同,BWM*的性能可以在运行算法之前在多项式时间内预测,这使蛋白质设计师能够为他们特定的设计问题选择最有效的算法。