Naor D, Brutlag D L
Department of Biochemistry, Stanford University School of Medicine, CA 94305-5307, USA.
J Comput Biol. 1994 Winter;1(4):349-66. doi: 10.1089/cmb.1994.1.349.
A near-optimal alignment between a pair of sequences is an alignment whose score lies within the neighborhood of the optimal score. We present an efficient method for representing all alignments whose score is within any given delta from the optimal score. The representation is a compact graph that makes it easy to impose additional biological constraints and select one desirable alignment from the large set of alignments. We study the combinatorial nature of near-optimal alignments, and define a set of "canonical" near-optimal alignments. We then show how to enumerate near-optimal alignments efficiently in order of their score, and count their number. When applied to comparisons of two distantly related proteins, near-optimal alignments reveal that the most conserved regions among the near-optimal alignments are the highly structured regions in the proteins. We also show that by counting the number of near optimal alignments as a function of the distance from the optimal score, we can select a good set of parameters that best constraints the biologically relevant alignments.
一对序列之间的近似最优比对是指其得分处于最优得分邻域内的比对。我们提出了一种有效方法,用于表示所有得分与最优得分相差任何给定增量的比对。该表示形式是一个紧凑的图,便于施加额外的生物学约束,并从大量比对中选择一个理想的比对。我们研究了近似最优比对的组合性质,并定义了一组“规范”的近似最优比对。然后,我们展示了如何按得分顺序有效地枚举近似最优比对,并计算它们的数量。当应用于两个远缘相关蛋白质的比较时,近似最优比对表明,近似最优比对中最保守的区域是蛋白质中的高度结构化区域。我们还表明,通过将近似最优比对的数量作为与最优得分距离的函数进行计数,我们可以选择一组能最好地约束生物学相关比对的良好参数。