Rannala Bruce
Department of Medical Genetics, University of Alberta, Edmonton, Alberta, T6G 2H7, Canada.
Syst Biol. 2002 Oct;51(5):754-60. doi: 10.1080/10635150290102429.
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.
基于马尔可夫链蒙特卡罗(MCMC)技术,利用DNA序列进行系统发育贝叶斯推断的方法,允许纳入任意复杂的DNA替换过程模型以及进化的其他方面。这提高了模型的真实性,有可能提高方法的准确性,并且在很大程度上促成了它们最近的流行。贝叶斯系统发育学中模型复杂性增加的另一个后果是,在几种情况下,这些模型变得参数过多。在这种情况下,模型的一些参数无法识别;不可识别参数的不同组合会导致相同的似然性,使得无法根据数据在潜在参数值之间做出决定。由于后验概率分布中参数之间存在较大的负相关性,参数过多的模型也会减慢MCMC算法的收敛速度。在参数过多的模型中,有时可以找到可识别的参数函数,基于这些函数的推断是合理的。文中给出了一些参数过多模型的例子,这些模型是在几种贝叶斯方法的背景下提出的,用于在替换率随时间变化时推断系统发育中节点的相对年龄。