Simmons Mark P, Müller Kai F, Webb Colleen T
Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, 48149 Münster, Germany.
Cladistics. 2011 Aug;27(4):402-416. doi: 10.1111/j.1096-0031.2010.00333.x. Epub 2010 Jul 26.
Alignment of nucleotide and/or amino acid sequences is a fundamental component of sequence-based molecular phylogenetic studies. Here we examined how different alignment methods affect the phylogenetic trees that are inferred from the alignments. We used simulations to determine how alignment errors can lead to systematic biases that affect phylogenetic inference from those sequences. We compared four approaches to sequence alignment: progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment and direct optimization. When taking into account branch support, implied alignments produced by direct optimization were found to show the most extreme behaviour (based on the alignment programs for which nearly equivalent alignment parameters could be set) in that they provided the strongest support for the correct tree in the simulations in which it was easy to resolve the correct tree and the strongest support for the incorrect tree in our long-branch-attraction simulations. When applied to alignment-sensitive process partitions with different histories, direct optimization showed the strongest mutual influence between the process partitions when they were aligned and phylogenetically analysed together, which makes detecting recombination more difficult. Simultaneous alignment performed well relative to direct optimization and progressive pairwise alignment across all simulations. Rather than relying upon methods that integrate alignment and tree search into a single step without accounting for alignment uncertainty, as with implied alignments, we suggest that simultaneous alignment using the similarity criterion, within the context of information available on biological processes and function, be applied whenever possible for sequence-based phylogenetic analyses.
核苷酸和/或氨基酸序列比对是基于序列的分子系统发育研究的一个基本组成部分。在这里,我们研究了不同的比对方法如何影响从比对结果推断出的系统发育树。我们使用模拟来确定比对错误如何导致系统偏差,从而影响从这些序列进行的系统发育推断。我们比较了四种序列比对方法:渐进式成对比对、序列片段的同时多序列比对、局部成对比对和直接优化。在考虑分支支持度时,发现直接优化产生的隐含比对表现出最极端的行为(基于可以设置几乎等效比对参数的比对程序),即在易于解析正确树的模拟中,它们为正确树提供了最强的支持,而在我们的长枝吸引模拟中,它们为错误树提供了最强的支持。当应用于具有不同历史的比对敏感过程分区时,直接优化在将过程分区一起比对和进行系统发育分析时显示出最强的相互影响,这使得检测重组更加困难。在所有模拟中,同时比对相对于直接优化和渐进式成对比对表现良好。我们建议,与隐含比对不同,不要依赖于将比对和树搜索整合到单个步骤中而不考虑比对不确定性的方法,而是在生物过程和功能的可用信息背景下,尽可能使用基于相似性标准的同时比对进行基于序列的系统发育分析。