Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA.
Mol Phylogenet Evol. 2010 Dec;57(3):1004-16. doi: 10.1016/j.ympev.2010.09.004. Epub 2010 Sep 16.
We used random sequences to determine which alignment methods are most susceptible to aligning sequences so as to create artifactual resolution and branch support in phylogenetic trees derived from those alignments. We compared four alignment methods (progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment, and direct optimization) to determine which methods are most susceptible to creating false positives in phylogenetic trees. Implied alignments created using direct optimization provided more artifactual support than progressive pairwise alignment methods, which in turn generally provided more artifactual support than simultaneous and local alignment methods. Artifactual support derived from base pairs was generally reinforced by the incorporation of gap characters for progressive pairwise alignment, local pairwise alignment, and implied alignments. The amount of artifactual resolution and support was generally greater for simulated nucleotide sequences than for simulated amino acid sequences. In the context of direct optimization, the differences between static and dynamic approaches to calculating support were extreme, ranging from maximal to nearly minimal support. When applied to highly divergent sequences, it is important that dynamic, rather than static, characters be used whenever calculating branch support using direct optimization. In contrast to the tree-based approaches to alignment, simultaneous alignment of sequences using the similarity criterion generally does not create alignments that are biased in favor of any particular tree topology.
我们使用随机序列来确定哪些对齐方法最容易对齐序列,从而在从这些对齐中得出的系统发育树中产生人为的分辨率和分支支持。我们比较了四种对齐方法(渐进对排列、序列片段的同时多重对齐、局部对排列和直接优化),以确定哪些方法最容易在系统发育树中产生假阳性。使用直接优化创建的隐含对齐比渐进对排列方法提供了更多的人为支持,而后者通常比同时和局部对齐方法提供了更多的人为支持。渐进对排列、局部对排列和隐含对齐中碱基对的人为支持通常通过包含空位字符得到加强。与模拟氨基酸序列相比,模拟核苷酸序列的人为分辨率和支持程度通常更高。在直接优化的背景下,静态和动态计算支持的方法之间的差异非常极端,从最大到几乎最小的支持都有。在直接优化中计算分支支持时,如果使用动态字符而不是静态字符,对于高度分化的序列,这一点非常重要。与基于树的对齐方法相反,使用相似性标准同时对齐序列通常不会产生有利于任何特定树拓扑的偏置对齐。