基于贝叶斯和AIC的系统发育模型不确定性度量的比较性能

Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty.

作者信息

Alfaro Michael E, Huelsenbeck John P

机构信息

School of Biological Sciences, P.O. Box 644236, Pullman, Washington 99164-4236, USA.

出版信息

Syst Biol. 2006 Feb;55(1):89-96. doi: 10.1080/10635150500433565.

DOI:10.1080/10635150500433565

PMID:16507526

Abstract

Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recently been applied to the problem of phylogenetic model selection. Here we use a simulation approach to assess the performance of this method and compare it to Akaike weights, a measure of model uncertainty that is based on the Akaike information criterion. Under conditions where the assumptions of the candidate models matched the generating conditions, both Bayesian and AIC-based methods perform well. The 95% credible interval contained the generating model close to 95% of the time. However, the size of the credible interval differed with the Bayesian credible set containing approximately 25% to 50% fewer models than an AIC-based credible interval. The posterior probability was a better indicator of the correct model than the Akaike weight when all assumptions were met but both measures performed similarly when some model assumptions were violated. Models in the Bayesian posterior distribution were also more similar to the generating model in their number of parameters and were less biased in their complexity. In contrast, Akaike-weighted models were more distant from the generating model and biased towards slightly greater complexity. The AIC-based credible interval appeared to be more robust to the violation of the rate homogeneity assumption. Both AIC and Bayesian approaches suggest that substantial uncertainty can accompany the choice of model for phylogenetic analyses, suggesting that alternative candidate models should be examined in analysis of phylogenetic data. [AIC; Akaike weights; Bayesian phylogenetics; model averaging; model selection; model uncertainty; posterior probability; reversible jump.].

摘要

可逆跳跃马尔可夫链蒙特卡罗方法（RJ - MCMC）是一种用于同时评估多个相关（但不一定嵌套）统计模型的技术，最近已应用于系统发育模型选择问题。在此，我们使用模拟方法来评估该方法的性能，并将其与基于赤池信息准则的模型不确定性度量——赤池权重进行比较。在候选模型的假设与生成条件匹配的情况下，基于贝叶斯和基于AIC的方法都表现良好。95%的可信区间在近95%的时间内包含生成模型。然而，可信区间的大小有所不同，基于贝叶斯的可信集包含的模型比基于AIC的可信区间少约25%至50%。当所有假设都满足时，后验概率比赤池权重更能指示正确模型，但当一些模型假设被违反时，两种度量的表现相似。贝叶斯后验分布中的模型在参数数量上也与生成模型更相似，并且在复杂性上偏差更小。相比之下，基于赤池权重的模型与生成模型的距离更远，并且偏向于稍微更高的复杂性。基于AIC的可信区间似乎对速率齐性假设的违反更具鲁棒性。AIC和贝叶斯方法都表明，系统发育分析中模型的选择可能伴随着相当大的不确定性，这表明在系统发育数据分析中应检查替代候选模型。[AIC；赤池权重；贝叶斯系统发育学；模型平均；模型选择；模型不确定性；后验概率；可逆跳跃。]