Mount David W
Cold Spring Harb Protoc. 2009 Jul;2009(7):pdb.ip61. doi: 10.1101/pdb.ip61.
It is difficult to find a global optimal alignment of more than two sequences (and, especially, more than three) that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time. Thus, approximate methods are used, such as progressive global alignment, iterative global alignment, alignments based on locally conserved patterns found in the same order in the sequences, statistical methods that generate probabilistic models of the sequences, and multiple sequence alignments produced by graph-based methods. When 10 or more sequences are being compared, it is common to begin by determining sequence similarities between all pairs of sequences in the set. A variety of methods are then available to cluster the sequences into the most related groups or into a phylogenetic tree. This article discusses several of these methods and provides data that compare their utility under various conditions.
要找到包含匹配、错配和空位,且能同时考虑所有序列变异程度的两个以上(尤其是三个以上)序列的全局最优比对是很困难的。因此,人们会使用近似方法,如渐进全局比对、迭代全局比对、基于序列中按相同顺序发现的局部保守模式的比对、生成序列概率模型的统计方法,以及基于图的方法产生的多序列比对。当比较10个或更多序列时,通常先确定集合中所有序列对之间的序列相似性。然后有多种方法可将这些序列聚类为最相关的组或系统发育树。本文讨论了其中几种方法,并提供了在各种条件下比较它们效用的数据。