Mount David W
CSH Protoc. 2008 Jun 1;2008:pdb.top40. doi: 10.1101/pdb.top40.
INTRODUCTIONTo obtain the best possible alignment between two sequences, it is necessary to include gaps in sequence alignments and use gap penalties. For aligning DNA sequences, a simple positive score for matches and a negative score for mismatches and gaps are most often used. To score matches and mismatches in alignments of proteins, it is necessary to know how often one amino acid is substituted for another in related proteins. In addition, a method is needed to account for insertions and deletions that sometimes appear in related DNA or protein sequences. To accommodate such sequence variations, gaps that appear in sequence alignments are given a negative penalty score reflecting the fact that they are not expected to occur very often. Mathematically speaking, it is very difficult to produce the best-possible alignment, either global or local, unless gaps are included in the alignment. This article discusses how to use gaps and gap penalties to optimize pairwise sequence alignments.
引言
为了使两个序列之间获得尽可能好的比对结果,有必要在序列比对中引入空位并使用空位罚分。对于DNA序列比对,最常用的是给匹配赋予一个简单的正分数,给错配和空位赋予一个负分数。对于蛋白质比对中的匹配和错配进行打分时,有必要了解在相关蛋白质中一种氨基酸被另一种氨基酸取代的频率。此外,还需要一种方法来处理有时出现在相关DNA或蛋白质序列中的插入和缺失。为了适应这种序列变异,序列比对中出现的空位会被赋予一个负罚分,以反映它们不太可能经常出现这一事实。从数学上讲,除非在比对中包含空位,否则很难产生全局或局部的最佳比对结果。本文讨论如何使用空位和空位罚分来优化双序列比对。