基于大规模序列比对的分歧时间的贝叶斯估计。

Bayesian estimation of divergence times from large sequence alignments.

机构信息

Department of Statistics, University of Auckland, Auckland, New Zealand.

出版信息

Mol Biol Evol. 2010 Aug;27(8):1768-81. doi: 10.1093/molbev/msq060. Epub 2010 Mar 1.

PMID:20194424

Abstract

Bayesian estimation of divergence times from molecular sequences relies on sophisticated Markov chain Monte Carlo techniques, and Metropolis-Hastings (MH) samplers have been successfully used in that context. This approach involves heavy computational burdens that can hinder the analysis of large phylogenomic data sets. Reliable estimation of divergence times can also be extremely time consuming, if not impossible, for sequence alignments that convey weak or conflicting phylogenetic signals, emphasizing the need for more efficient sampling methods. This article describes a new approach that estimates the posterior density of substitution rates and node times. The prior distribution of rates accounts for their potential autocorrelation along lineages, whereas priors on node ages are modeled with uniform densities. Also, the likelihood function is approximated by a multivariate normal density. The combination of these components leads to convenient mathematical simplifications, allowing the posterior distribution of rates and times to be estimated using a Gibbs sampling algorithm. The analysis of four real-world data sets shows that this sampler outperforms the standard MH approach and demonstrates the suitability of this new method for analyzing large and/or difficult data sets.

摘要

贝叶斯估计从分子序列的分歧时间依赖于复杂的马尔可夫链蒙特卡罗技术，和 metropolis-hastings (mh) 采样器已成功地应用于这种情况下。这种方法涉及到繁重的计算负担，可能会阻碍大基因组数据集的分析。可靠的分歧时间估计也可以是非常耗时的，如果不是不可能的，序列比对传达弱或冲突的系统发育信号，强调需要更有效的采样方法。本文描述了一种新的方法来估计替代率和节点时间的后验密度。率的先验分布考虑了它们沿谱系的潜在自相关性，而节点年龄的先验分布则采用均匀密度来建模。此外，似然函数被近似为多元正态密度。这些组成部分的结合导致了方便的数学简化，允许使用吉布斯抽样算法来估计率和时间的后验分布。对四个真实数据集的分析表明，该采样器优于标准的 mh 方法，并证明了这种新方法适用于分析大型和/或困难数据集。