Bergsten Johannes
Department of Ecology and Environmental Science, Umeå University, SE-90187 Umeå, Sweden.
Cladistics. 2005 Apr;21(2):163-193. doi: 10.1111/j.1096-0031.2005.00059.x.
The history of long-branch attraction, and in particular methods suggested to detect and avoid the artifact to date, is reviewed. Methods suggested to avoid LBA-artifacts include excluding long-branch taxa, excluding faster evolving third codon positions, using inference methods less sensitive to LBA such as likelihood, the Aguinaldo et al. approach, sampling more taxa to break up long branches and sampling more characters especially of another kind, and the pros and cons of these are discussed. Methods suggested to detect LBA are numerous and include methodological disconcordance, RASA, separate partition analyses, parametric simulation, random outgroup sequences, long-branch extraction, split decomposition and spectral analysis. Less than 10 years ago it was doubted if LBA occurred in real datasets. Today, examples are numerous in the literature and it is argued that the development of methods to deal with the problem is warranted. A 16 kbp dataset of placental mammals and a morphological and molecular combined dataset of gall waSPS are used to illustrate the particularly common problem of LBA of problematic ingroup taxa to outgroups. The preferred methods of separate partition analysis, methodological disconcordance, and long branch extraction are used to demonstrate detection methods. It is argued that since outgroup taxa almost always represent long branches and are as such a hazard towards misplacing long branched ingroup taxa, phylogenetic analyses should always be run with and without the outgroups included. This will detect whether only the outgroup roots the ingroup or if it simultaneously alters the ingroup topology, in which case previous studies have shown that the latter is most often the worse. Apart from that LBA to outgroups is the major and most common problem; scanning the literature also detected the ill advised comfort of high support values from thousands of characters, but very few taxa, in the age of genomics. Taxon sampling is crucial for an accurate phylogenetic estimate and trust cannot be put on whole mitochondrial or chloroplast genome studies with only a few taxa, despite their high support values. The placental mammal example demonstrates that parsimony analysis will be prone to LBA by the attraction of the tenrec to the distant marsupial outgroups. In addition, the murid rodents, creating the classic "the guinea-pig is not a rodent" hypothesis in 1996, are also shown to be attracted to the outgroup by nuclear genes, although including the morphological evidence for rodents and Glires overcomes the artifact. The gall wasp example illustrates that Bayesian analyses with a partition-specific GTR + Γ + I model give a conflicting resolution of clades, with a posterior probability of 1.0 when comparing ingroup alone versus outgroup rooted topologies, and this is due to long-branch attraction to the outgroup.
回顾了长枝吸引的历史,特别是迄今为止为检测和避免该假象而提出的方法。为避免长枝吸引假象而提出的方法包括排除长枝分类群、排除进化较快的第三密码子位置、使用对长枝吸引不太敏感的推断方法(如似然法、阿吉纳尔多等人的方法)、增加分类群采样以打断长枝以及增加特别是其他类型的特征采样,并讨论了这些方法的优缺点。为检测长枝吸引而提出的方法众多,包括方法不一致性、RASA、单独分区分析、参数模拟、随机外类群序列、长枝提取、分裂分解和光谱分析。不到10年前,人们还怀疑长枝吸引是否会出现在实际数据集中。如今,文献中的例子众多,有人认为有必要开发处理该问题的方法。使用胎盘哺乳动物的一个16千碱基对数据集以及瘿蜂的形态学和分子联合数据集来说明内类群有问题的分类群与外类群之间长枝吸引这一特别常见的问题。使用单独分区分析、方法不一致性和长枝提取等首选方法来演示检测方法。有人认为,由于外类群分类群几乎总是代表长枝,因此对长枝内类群分类群的错误定位存在风险,系统发育分析应该在包含和不包含外类群的情况下都进行。这将检测是只有外类群为内类群确定根节点,还是它同时改变了内类群的拓扑结构,在这种情况下,先前的研究表明后者往往更糟糕。除了长枝吸引到外类群是主要且最常见的问题之外,浏览文献还发现,在基因组学时代,从数千个特征但很少的分类群中获得高支持值这种做法是不明智的。分类群采样对于准确的系统发育估计至关重要,不能仅仅因为支持值高就信赖只有少数分类群的全线粒体或叶绿体基因组研究。胎盘哺乳动物的例子表明,简约分析容易受到长枝吸引的影响,如刺猬被吸引到遥远的有袋类外类群。此外,1996年提出经典的“豚鼠不是啮齿动物”假说的鼠科啮齿动物,也被核基因吸引到外类群,不过纳入啮齿动物和啮形类的形态学证据克服了这一假象。瘿蜂的例子说明,使用特定分区的GTR + Γ + I模型进行贝叶斯分析时,在比较单独的内类群与以外类群为根的拓扑结构时,会给出相互矛盾的分支分辨率,后验概率为1.0,这是由于长枝吸引到外类群所致。