Nylander Johan A A, Ronquist Fredrik, Huelsenbeck John P, Nieves-Aldrey José Luis
Department of Systematic Zoology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, SE-752 36 Uppsala, Sweden.
Syst Biol. 2004 Feb;53(1):47-67. doi: 10.1080/10635150490264699.
The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.
近期利用马尔可夫链蒙特卡罗(MCMC)技术进行的贝叶斯系统发育推断发展,推动了对参数丰富的进化模型的探索。与此同时,随机模型变得更加现实(且复杂),并已扩展到新的数据类型,如形态学数据。基于此基础,我们开发了一种用于分析组合数据集的贝叶斯MCMC方法,并基于形态学数据和四个基因(核基因和线粒体基因、核糖体基因和蛋白质编码基因)的数据,探索了其在推断瘿蜂科昆虫间关系方面的效用。所检验的模型复杂度各异,从仅识别形态学和分子分区的模型到为每个基因设置独立参数的复杂替代模型。贝叶斯MCMC分析能有效处理复杂模型:对于复杂模型,收敛更快且更可预测,即使在非常复杂的模型下,所有参数的混合也足够充分,并且参数更新周期几乎不受跨位点模型划分的影响。形态学仅占数据集中字符的5%,但仍影响了组合数据树,支持了形态学数据在多基因分析中的效用。我们使用贝叶斯准则(贝叶斯因子)表明,数据分区间的过程异质性是一个重要的模型组成部分,尽管不如位点间速率变化那么重要。更复杂的进化模型与更多的拓扑不确定性以及形态学和分子间更少的冲突相关联。贝叶斯因子有时更青睐简单模型而非参数丰富得多的模型,但总体上最佳模型也是最复杂的,且贝叶斯因子不支持从该模型中排除明显较弱的参数。因此,贝叶斯因子似乎有助于在复杂模型中进行选择,但仍不清楚其使用是否在模型复杂度和参数估计误差之间达成了合理的平衡。