Kück Patrick, Wägele J Wolfgang
The Natural History Museum, Cromwell Road, SW7 5BD, London, UK.
Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113, Bonn, Germany.
Cladistics. 2016 Aug;32(4):461-478. doi: 10.1111/cla.12132. Epub 2015 Jul 20.
Analysis of sequence data using time-reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split-supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch-length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.
使用时间可逆替代模型和最大似然(ML)算法分析序列数据是目前推断系统发育最流行的方法,尽管结果常常相互矛盾。在寻找错误来源时,我们关注这些方法一个迄今被忽视的特征:在ML分析中,性状极性通常被认为是无关紧要的。我们在支持分支的位点模式层面分析了导致错误树形拓扑的机制。在模拟中,通过与已知的根序列进行比较,可以识别出近裔共性位点模式。这些模式会产生一些惊人的影响:使用沿着各种拓扑结构模拟序列进化生成的数据集,并使用相同(正确)的模型推断树形,对于分支长度异质性的情况,我们发现(i)如已知的那样,即使使用了正确的替代模型,ML分析也可能无法恢复正确的树形,但同时(ii)近裔共性性状状态会导致重大错误,因此性状极性是相关的,并且(iii)长分支上积累的偶然相似性比短分支上积累的近裔共性状态误导性要小得多。当分支长度异质时会出现假象。当从数据集中删除支持错误分支的近裔共性位点时,系统误差在很大程度上会消失。我们得出结论,由于近裔共性被忽视的影响,过去几十年发表的许多系统发育树可能是错误的。