Altschul S F
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Proteins. 1998 Jul 1;32(1):88-96.
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment.
基于单个突变事件可删除或插入多个残基这一观察结果,序列比对中的仿射空位罚分对空位的存在收取一个罚分,以及一个与长度相关的额外罚分。从远缘相关蛋白质的结构比对或多序列比对中可以观察到,保守残基常常落入由相对非保守区域分隔的无空位区域。为利用这一结构,提出了仿射空位罚分的一种简单推广形式,它能有效忽略非保守区域。经验表明,使用这些广义空位罚分进行局部比对得到的分数分布遵循极值分布。从统计显著性和比对准确性两个角度给出了示例,说明广义仿射空位罚分能产生更优的比对结果。讨论了选择广义仿射空位罚分的指导原则,以及它们在多序列比对中的可能应用。