Zoology, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
Genetics, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
PeerJ. 2024 Jan 8;12:e16706. doi: 10.7717/peerj.16706. eCollection 2024.
Recently, many studies have addressed the performance of phylogenetic tree-building methods (maximum parsimony, maximum likelihood, and Bayesian inference), focusing primarily on simulated data. However, for discrete morphological data, there is no consensus yet on which methods recover the phylogeny with better performance. To address this lack of consensus, we investigate the performance of different methods using an empirical dataset for hexapods as a model. As an empirical test of performance, we applied normalized indices to effectively measure accuracy (normalized Robinson-Foulds metric, nRF) and precision, which are measured resolution, one minus Colless' consensus fork index (1-CFI). Additionally, to further explore phylogenetic accuracy and support measures, we calculated other statistics, such as the true positive rate (statistical power) and the false positive rate (type I error), and constructed receiver operating characteristic plots to visualize the relationship between these statistics. We applied the normalized indices to the reconstructed trees from the reanalyses of an empirical discrete morphological dataset from extant Hexapoda using a well-supported phylogenomic tree as a reference. Maximum likelihood and Bayesian inference applying the k-state Markov (Mk) model (without or with a discrete gamma distribution) performed better, showing higher precision (resolution). Additionally, our results suggest that most available tree topology tests are reliable estimators of the performance measures applied in this study. Thus, we suggest that likelihood-based methods and tree topology tests should be used more often in phylogenetic tree studies based on discrete morphological characters. Our study provides a fair indication that morphological datasets have robust phylogenetic signal.
最近,许多研究都集中在对系统发育树构建方法(最大简约法、最大似然法和贝叶斯推断)的性能进行评估,主要针对模拟数据。然而,对于离散形态数据,哪种方法的性能更好,目前还没有共识。为了解决这一共识缺失的问题,我们以六足动物的形态数据集为模型,研究了不同方法的性能。作为性能的实证检验,我们应用归一化指数有效地衡量准确性(归一化罗宾逊-福尔德度量,nRF)和精度,通过测量分辨率来衡量精度,即减去科利斯共识叉指数(1-CFI)。此外,为了进一步探索系统发育准确性和支持度的衡量指标,我们计算了其他统计量,如真阳性率(统计功效)和假阳性率(I 类错误),并构建了接收者操作特征图来可视化这些统计量之间的关系。我们将归一化指数应用于从现存六足动物的离散形态数据的实证研究中重建的树,使用支持良好的系统发育基因组树作为参考。最大似然法和贝叶斯推断应用 k 状态马尔可夫(Mk)模型(无离散伽马分布或有离散伽马分布)表现更好,显示出更高的精度(分辨率)。此外,我们的结果表明,大多数可用的树拓扑测试是本研究应用的性能衡量指标的可靠估计量。因此,我们建议在基于离散形态特征的系统发育树研究中,更多地使用基于似然的方法和树拓扑测试。我们的研究提供了一个公平的指示,表明形态数据集具有稳健的系统发育信号。