Xu Y, Uberbacher E C
Computer Science and Mathematics Division, Oak Ridge National Laboratory, TN 37831-6364, USA.
Comput Appl Biosci. 1996 Dec;12(6):511-7. doi: 10.1093/bioinformatics/12.6.511.
This paper presents an algorithm for constructing an optimal alignment between a three-dimensional protein structure template and an amino acid sequence. A protein structure template is given as a sequence of amino acid residue positions in three-dimensional space, along with an array of physical properties attached to each position; these residue positions are sequentially grouped into a series of core secondary structures (central helices and beta sheets). In addition to match scores and gap penalties, as in a traditional sequence-sequence alignment problem, the quality of a structure-sequence alignment is also determined by interaction preferences among amino acids aligned with structure positions that are spatially close (we call these 'long-range interactions'). Although it is known that constructing such a structure-sequence alignment in the most general form is NP-hard, our algorithm runs in polynomial time when restricted to structures with a 'modest' number of long-range amino acid interactions. In the current work, long-range interactions are limited to interactions between amino acids from different core secondary structures. Dividing the series of core secondary structures into two subseries creates a cut set of long-range interactions. If we use N, M and C to represent the size of an amino acid sequence, the size of a structure template, and the maximum cut size of long-range interactions, respectively, the algorithm finds an optimal structure-sequence alignment in O(21C NM) time, a polynomial function of N and M when C = O(log(N + M)). When running on structure-sequence alignment problems without long-range intersections, i.e. C = 0, the algorithm achieves the same asymptotic computational complexity of the Smith-Waterman sequence-sequence alignment algorithm.
本文提出了一种用于构建三维蛋白质结构模板与氨基酸序列之间最优比对的算法。蛋白质结构模板以三维空间中氨基酸残基位置的序列形式给出,并附带一个与每个位置相关的物理性质数组;这些残基位置被依次分组为一系列核心二级结构(中心螺旋和β折叠)。与传统的序列 - 序列比对问题一样,除了匹配得分和空位罚分之外,结构 - 序列比对的质量还由与空间上接近的结构位置对齐的氨基酸之间的相互作用偏好来决定(我们将这些称为“远程相互作用”)。虽然已知以最一般形式构建这种结构 - 序列比对是NP难问题,但当限制在具有“适度”数量远程氨基酸相互作用的结构时,我们的算法在多项式时间内运行。在当前工作中,远程相互作用仅限于来自不同核心二级结构的氨基酸之间的相互作用。将核心二级结构系列划分为两个子系列会产生一组远程相互作用的割集。如果我们分别用N、M和C表示氨基酸序列的大小、结构模板的大小以及远程相互作用的最大割集大小,那么该算法在O(21C NM)时间内找到最优结构 - 序列比对,当C = O(log(N + M))时,这是N和M的多项式函数。当在没有远程交叉的结构 - 序列比对问题上运行时,即C = 0,该算法实现了与Smith - Waterman序列 - 序列比对算法相同的渐近计算复杂度。