Huang Xiaoqiu, Brutlag Douglas L
Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA.
Nucleic Acids Res. 2007;35(2):678-86. doi: 10.1093/nar/gkl1063. Epub 2006 Dec 19.
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257,716 pairs of homologous sequences from 100 protein families. On 168,475 of the 257,716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.
两个同源序列之间的保守程度在不同的序列区域往往有所不同;功能上重要的结构域比其余区域更保守。因此,在比对同源序列时应使用多套参数,对于高度保守区域采用严格的参数集,对于弱保守区域采用适中的参数集。我们描述了一种比对算法,以便在计算两个序列的最优比对时动态使用具有不同严格程度的多套参数。该算法动态考虑各种候选比对,将每个候选比对划分为多个部分,并为比对的每个部分确定最合适的参数值集。该算法及其局部比对版本在一个名为GAP4的计算机程序中实现。对GAP4中的局部比对算法、其前身GAP3中的局部比对算法以及一个普通的局部比对程序SIM,在来自100个蛋白质家族的257,716对同源序列上进行了评估。在257,716对序列中的168,475对(比例为65.4%)上,GAP4得到的比对在统计学上比GAP3和SIM得到的比对更显著。