Harrison Luke B, Larsson Hans C E
Redpath Museum, McGill University, 859 Sherbrooke Street West Montreal, Quebec, Canada H3A 0C4 and Redpath Museum, McGill University, 859 Sherbrooke ST W, Montreal, Quebec, Canada H3A 0C4
Redpath Museum, McGill University, 859 Sherbrooke Street West Montreal, Quebec, Canada H3A 0C4 and Redpath Museum, McGill University, 859 Sherbrooke ST W, Montreal, Quebec, Canada H3A 0C4.
Syst Biol. 2015 Mar;64(2):307-24. doi: 10.1093/sysbio/syu098. Epub 2014 Dec 18.
Likelihood-based methods are commonplace in phylogenetic systematics. Although much effort has been directed toward likelihood-based models for molecular data, comparatively less work has addressed models for discrete morphological character (DMC) data. Among-character rate variation (ACRV) may confound phylogenetic analysis, but there have been few analyses of the magnitude and distribution of rate heterogeneity among DMCs. Using 76 data sets covering a range of plants, invertebrate, and vertebrate animals, we used a modified version of MrBayes to test equal, gamma-distributed and lognormally distributed models of ACRV, integrating across phylogenetic uncertainty using Bayesian model selection. We found that in approximately 80% of data sets, unequal-rates models outperformed equal-rates models, especially among larger data sets. Moreover, although most data sets were equivocal, more data sets favored the lognormal rate distribution relative to the gamma rate distribution, lending some support for more complex character correlations than in molecular data. Parsimony estimation of the underlying rate distributions in several data sets suggests that the lognormal distribution is preferred when there are many slowly evolving characters and fewer quickly evolving characters. The commonly adopted four rate category discrete approximation used for molecular data was found to be sufficient to approximate a gamma rate distribution with discrete characters. However, among the two data sets tested that favored a lognormal rate distribution, the continuous distribution was better approximated with at least eight discrete rate categories. Although the effect of rate model on the estimation of topology was difficult to assess across all data sets, it appeared relatively minor between the unequal-rates models for the one data set examined carefully. As in molecular analyses, we argue that researchers should test and adopt the most appropriate model of rate variation for the data set in question. As discrete characters are increasingly used in more sophisticated likelihood-based phylogenetic analyses, it is important that these studies be built on the most appropriate and carefully selected underlying models of evolution.
基于似然性的方法在系统发育系统学中很常见。尽管已经投入了大量精力来构建基于似然性的分子数据模型,但针对离散形态特征(DMC)数据模型的研究相对较少。特征间速率变异(ACRV)可能会混淆系统发育分析,但对DMC间速率异质性的大小和分布进行分析的研究却很少。我们使用了涵盖一系列植物、无脊椎动物和脊椎动物的76个数据集,采用MrBayes的修改版本来测试ACRV的等速率、伽马分布和对数正态分布模型,并使用贝叶斯模型选择来整合系统发育不确定性。我们发现,在大约80%的数据集中,不等速率模型优于等速率模型,尤其是在较大的数据集中。此外,尽管大多数数据集模棱两可,但相对于伽马速率分布,更多的数据集支持对数正态速率分布,这为比分子数据中更复杂的特征相关性提供了一些支持。对几个数据集中潜在速率分布的简约估计表明,当存在许多缓慢进化的特征和较少快速进化的特征时,对数正态分布更受青睐。结果发现,用于分子数据的常用四速率类别离散近似足以用离散特征近似伽马速率分布。然而,在测试的两个支持对数正态速率分布的数据集中,至少需要八个离散速率类别才能更好地近似连续分布。尽管速率模型对拓扑结构估计的影响在所有数据集中难以评估,但在所仔细研究的一个数据集中,不等速率模型之间的影响似乎相对较小。与分子分析一样我们认为研究人员应该针对所研究的数据集测试并采用最合适的速率变异模型。随着离散特征越来越多地用于更复杂的基于似然性的系统发育分析中,重要的是这些研究应建立在最合适且经过精心选择的进化基础模型之上。