最优蛋白质结构比对算法。

Algorithms for optimal protein structure alignment.

机构信息

Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.

出版信息

Bioinformatics. 2009 Nov 1;25(21):2751-6. doi: 10.1093/bioinformatics/btp530. Epub 2009 Sep 4.

DOI:10.1093/bioinformatics/btp530

PMID:19734152

Abstract

MOTIVATION

Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable.

RESULTS

We study one of the most widely used measures of protein structural similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. We prove that, for any two proteins, this measure can be optimized for all but finitely many distance cutoffs. Our method leads to a series of algorithms for optimizing other structure similarity measures, including the measures commonly used in protein structure prediction experiments. We also present a polynomial time algorithm for finding a near-optimal superposition of two proteins. Aside from having a relatively low cost, the algorithm for near-optimal solution returns a superposition of provable quality. In other words, the difference between the score of the returned superposition and the score of an optimal superposition can be explicitly computed and used to determine whether the returned superposition is, in fact, the best superposition.

CONTACT

poleksic@cs.uni.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结构比对是理解蛋白质之间进化关系的重要工具。然而，由于两个结构的可能叠加数量是无限的，因此找到最佳的两两结构比对是困难的。与具有多项式时间解决方案的序列比对问题不同，结构比对问题甚至没有被归类为可解。

结果

我们研究了蛋白质结构相似性最广泛使用的度量之一，该度量定义为在预定义的距离截止值下可以叠加的两个蛋白质中残基对的数量。我们证明，对于任何两个蛋白质，除了有限数量的距离截止值外，都可以对该度量进行优化。我们的方法为优化其他结构相似性度量（包括蛋白质结构预测实验中常用的度量）提供了一系列算法。我们还提出了一种用于找到两个蛋白质的近最优叠加的多项式时间算法。除了相对较低的成本外，近最优解决方案的算法还返回可证明质量的叠加。换句话说，可以显式计算返回的叠加与最佳叠加之间的得分差，并用于确定返回的叠加是否实际上是最佳叠加。