Suppr超能文献

近裔共性特征状态在分子系统发育分析中会导致系统误差:一项模拟研究。

Plesiomorphic character states cause systematic errors in molecular phylogenetic analyses: a simulation study.

作者信息

Kück Patrick, Wägele J Wolfgang

机构信息

The Natural History Museum, Cromwell Road, SW7 5BD, London, UK.

Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113, Bonn, Germany.

出版信息

Cladistics. 2016 Aug;32(4):461-478. doi: 10.1111/cla.12132. Epub 2015 Jul 20.

Abstract

Analysis of sequence data using time-reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split-supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch-length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.

摘要

使用时间可逆替代模型和最大似然(ML)算法分析序列数据是目前推断系统发育最流行的方法,尽管结果常常相互矛盾。在寻找错误来源时,我们关注这些方法一个迄今被忽视的特征:在ML分析中,性状极性通常被认为是无关紧要的。我们在支持分支的位点模式层面分析了导致错误树形拓扑的机制。在模拟中,通过与已知的根序列进行比较,可以识别出近裔共性位点模式。这些模式会产生一些惊人的影响:使用沿着各种拓扑结构模拟序列进化生成的数据集,并使用相同(正确)的模型推断树形,对于分支长度异质性的情况,我们发现(i)如已知的那样,即使使用了正确的替代模型,ML分析也可能无法恢复正确的树形,但同时(ii)近裔共性性状状态会导致重大错误,因此性状极性是相关的,并且(iii)长分支上积累的偶然相似性比短分支上积累的近裔共性状态误导性要小得多。当分支长度异质时会出现假象。当从数据集中删除支持错误分支的近裔共性位点时,系统误差在很大程度上会消失。我们得出结论,由于近裔共性被忽视的影响,过去几十年发表的许多系统发育树可能是错误的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验