Suppr超能文献

评估概率算法在大型形态数据集系统发育分析中的性能:一项模拟研究。

Evaluating the Performance of Probabilistic Algorithms for Phylogenetic Analysis of Big Morphological Datasets: A Simulation Study.

机构信息

Department of Biological Sciences, University of Alberta, 11455 Saskatchewan Drive, Edmonton, Alberta T6G 2E9, Canada.

Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA.

出版信息

Syst Biol. 2020 Nov 1;69(6):1088-1105. doi: 10.1093/sysbio/syaa020.

Abstract

Reconstructing the tree of life is an essential task in evolutionary biology. It demands accurate phylogenetic inference for both extant and extinct organisms, the latter being almost entirely dependent on morphological data. While parsimony methods have traditionally dominated the field of morphological phylogenetics, a rapidly growing number of studies are now employing probabilistic methods (maximum likelihood and Bayesian inference). The present-day toolkit of probabilistic methods offers varied software with distinct algorithms and assumptions for reaching global optimality. However, benchmark performance assessments of different software packages for the analyses of morphological data, particularly in the era of big data, are still lacking. Here, we test the performance of four major probabilistic software under variable taxonomic sampling and missing data conditions: the Bayesian inference-based programs MrBayes and RevBayes, and the maximum likelihood-based IQ-TREE and RAxML. We evaluated software performance by calculating the distance between inferred and true trees using a variety of metrics, including Robinson-Foulds (RF), Matching Splits (MS), and Kuhner-Felsenstein (KF) distances. Our results show that increased taxonomic sampling improves accuracy, precision, and resolution of reconstructed topologies across all tested probabilistic software applications and all levels of missing data. Under the RF metric, Bayesian inference applications were the most consistent, accurate, and robust to variation in taxonomic sampling in all tested conditions, especially at high levels of missing data, with little difference in performance between the two tested programs. The MS metric favored more resolved topologies that were generally produced by IQ-TREE. Adding more taxa dramatically reduced performance disparities between programs. Importantly, our results suggest that the RF metric penalizes incorrectly resolved nodes (false positives) more severely than the MS metric, which instead tends to penalize polytomies. If false positives are to be avoided in systematics, Bayesian inference should be preferred over maximum likelihood for the analysis of morphological data.

摘要

重建生命之树是进化生物学的一项基本任务。它需要对现存和已灭绝的生物进行准确的系统发育推断,后者几乎完全依赖于形态学数据。虽然简约法传统上主导着形态系统发生学领域,但越来越多的研究现在采用概率方法(最大似然法和贝叶斯推断)。目前,概率方法的工具包提供了各种软件,它们具有不同的算法和假设,以达到全局最优。然而,对于形态数据分析的不同软件包的基准性能评估,特别是在大数据时代,仍然缺乏。在这里,我们在可变分类采样和缺失数据条件下测试了四种主要概率软件的性能:基于贝叶斯推断的 MrBayes 和 RevBayes 程序,以及基于最大似然的 IQ-TREE 和 RAxML。我们通过使用各种指标(包括罗宾逊-福尔德(RF)、匹配分裂(MS)和库恩-费尔斯坦因(KF)距离)计算推断树与真实树之间的距离来评估软件性能。我们的结果表明,增加分类采样可以提高所有测试的概率软件应用程序和所有缺失数据水平的重建拓扑的准确性、精度和分辨率。在 RF 指标下,贝叶斯推断应用程序在所有测试条件下都是最一致、最准确和最稳健的,特别是在高缺失数据水平下,两个测试程序之间的性能差异很小。MS 指标有利于产生更具分辨率的拓扑结构,这些拓扑结构通常由 IQ-TREE 产生。增加更多的分类单元会大大降低程序之间的性能差异。重要的是,我们的结果表明,RF 指标比 MS 指标更严厉地惩罚错误解决的节点(假阳性),而 MS 指标则倾向于惩罚多系。如果要在系统学中避免假阳性,那么对于形态数据分析,应该优先选择贝叶斯推断而不是最大似然法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验