Genominformatik, Universität Duisburg-Essen/Universitätsklinikum, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jan-Feb;10(1):26-36. doi: 10.1109/TCBB.2012.143.
We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model. This scoring model is based on comparing the interresidue distance matrices of proteins and is used in the popular DALI software tool, a heuristic method for protein structure alignment. Our model and algorithm extend an integer linear programming approach that has been previously applied for the related, but simpler, contact map overlap problem. To this end, we introduce a novel type of constraint that handles negative score values and relax it in a Lagrangian fashion. The new algorithm, which we call DALIX, is applicable to any distance matrix-based scoring scheme. We also review options that allow to consider fewer pairs of interresidue distances explicitly because their large number hinders the optimization process. Using four known data sets of varying structural similarity, we compute many provably score-optimal DALI alignments. This allowed, for the first time, to evaluate the DALI heuristic in sound mathematical terms. The results indicate that DALI usually computes optimal or close to optimal alignments. However, we detect a subset of small proteins for which DALI fails to generate any significant alignment, although such alignments do exist.
我们提出了一个数学模型和精确算法,用于使用 DALI 评分模型最优地对齐蛋白质结构。该评分模型基于比较蛋白质的残基间距离矩阵,用于流行的 DALI 软件工具,这是一种蛋白质结构对齐的启发式方法。我们的模型和算法扩展了一种整数线性规划方法,该方法先前已应用于相关但更简单的接触图重叠问题。为此,我们引入了一种新的约束类型,用于处理负评分值,并以拉格朗日方式对其进行松弛。我们称之为 DALIX 的新算法适用于任何基于距离矩阵的评分方案。我们还回顾了一些选项,这些选项允许更显式地考虑较少的残基间距离对,因为它们的数量很多会阻碍优化过程。使用四个具有不同结构相似性的已知数据集,我们计算了许多可证明得分最优的 DALI 对齐。这首次允许从合理的数学角度评估 DALI 启发式方法。结果表明,DALI 通常计算最佳或接近最佳的对齐。但是,我们检测到一小部分蛋白质,DALI 无法为其生成任何有意义的对齐,尽管确实存在这种对齐。