Rodrigue Nicolas, Philippe Hervé, Lartillot Nicolas
Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Québec, Canada.
Syst Biol. 2007 Oct;56(5):711-26. doi: 10.1080/10635150701611258.
In recent years, the advent of Markov chain Monte Carlo (MCMC) techniques, coupled with modern computational capabilities, has enabled the study of evolutionary models without a closed form solution of the likelihood function. However, current Bayesian MCMC applications can incur significant computational costs, as they are based on a full sampling from the posterior probability distribution of the parameters of interest. Here, we draw attention as to how MCMC techniques can be embedded within normal approximation strategies for more economical statistical computation. The overall procedure is based on an estimate of the first and second moments of the likelihood function, as well as a maximum likelihood estimate. Through examples, we review several MCMC-based methods used in the statistical literature for such estimation, applying the approaches to constructing posterior distributions under non-analytical evolutionary models relaxing the assumptions of rate homogeneity, and of independence between sites. Finally, we use the procedures for conducting Bayesian model selection, based on Laplace approximations of Bayes factors, which we find to be accurate and computationally advantageous. Altogether, the methods we expound here, as well as other related approaches from the statistical literature, should prove useful when investigating increasingly complex descriptions of molecular evolution, alleviating some of the difficulties associated with nonanalytical models.
近年来,马尔可夫链蒙特卡罗(MCMC)技术的出现,再加上现代计算能力,使得对没有似然函数闭式解的进化模型进行研究成为可能。然而,当前的贝叶斯MCMC应用可能会产生巨大的计算成本,因为它们基于对感兴趣参数的后验概率分布进行全采样。在此,我们关注如何将MCMC技术嵌入到正态近似策略中,以实现更经济的统计计算。整个过程基于似然函数一阶矩和二阶矩的估计以及最大似然估计。通过实例,我们回顾了统计文献中用于此类估计的几种基于MCMC的方法,并将这些方法应用于在放宽速率齐性假设和位点间独立性假设的非解析进化模型下构建后验分布。最后,我们使用基于贝叶斯因子拉普拉斯近似的程序进行贝叶斯模型选择,发现其准确且在计算上具有优势。总之,我们在此阐述的方法以及统计文献中的其他相关方法,在研究日益复杂的分子进化描述时应会很有用,可缓解与非解析模型相关的一些困难。