Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, USA.
Mol Biol Evol. 2010 Jun;27(6):1289-300. doi: 10.1093/molbev/msq014. Epub 2010 Jan 21.
The rapid expansion of sequence data and the development of statistical approaches that embrace varying evolutionary rates among lineages have encouraged many more investigators to use DNA and protein data to time species divergences. Here, we report results from a systematic evaluation, by means of computer simulation, of the performance of two frequently used relaxed-clock methods for estimating these times and their credibility intervals (CrIs). These relaxed-clock methods allow rates to vary in a phylogeny randomly over lineages (e.g., BEAST software) and in autocorrelated fashion (e.g., MultiDivTime software). We applied these methods for analyzing sequence data sets simulated using naturally derived parameters (evolutionary rates, sequence lengths, and base substitution patterns) and assuming that clock calibrations are known without error. We find that the estimated times are, on average, close to the true times as long as the assumed model of lineage rate changes matches the actual model. The 95% CrIs also contain the true time for >or=95% of the simulated data sets. However, the use of incorrect lineage rate model reduces this frequency to 83%, indicating that the relaxed-clock methods are not robust to the violation of underlying lineage rate model. Because these rate models are rarely known a priori and are difficult to detect empirically, we suggest building composite CrIs using CrIs produced from MultiDivTime and BEAST analysis. These composite CrIs are found to contain the true time for >or=97% data sets. Our analyses also verify the usefulness of the common practice of interpreting the congruence of times inferred from different methods as a reflection of the accuracy of time estimates. Overall, our results show that simple strategies can be used to enhance our ability to estimate times and their CrIs when using the relaxed-clock methods.
序列数据的快速扩展和采用不同进化速率的统计方法的发展,鼓励了更多的研究人员使用 DNA 和蛋白质数据来确定物种的分化时间。在这里,我们通过计算机模拟,对两种常用于估计这些时间及其置信区间(CrI)的松弛时钟方法的性能进行了系统评估。这些松弛时钟方法允许在谱系中随机(例如,BEAST 软件)或自相关方式(例如,MultiDivTime 软件)改变谱系的速率。我们应用这些方法来分析使用自然衍生参数(进化速率、序列长度和碱基替换模式)模拟的序列数据集,并假设时钟校准没有错误。我们发现,只要所假设的谱系速率变化模型与实际模型匹配,估计的时间平均接近真实时间。95%的 CrI 也包含真实时间,对于>或=95%的模拟数据集。然而,使用不正确的谱系速率模型将这一频率降低到 83%,表明松弛时钟方法对违反基本谱系速率模型的情况并不稳健。由于这些速率模型很少是先验已知的,并且难以通过经验检测到,因此我们建议使用 MultiDivTime 和 BEAST 分析生成的 CrI 构建复合 CrI。这些复合 CrI 发现包含>或=97%数据集的真实时间。我们的分析还验证了将不同方法推断的时间一致性解释为时间估计准确性的反映的常见做法的有用性。总体而言,我们的结果表明,当使用松弛时钟方法时,可以使用简单的策略来提高我们估计时间及其 CrI 的能力。