Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, Uppsala, Sweden.
Evol Bioinform Online. 2008 Mar 18;4:75-95. doi: 10.4137/ebo.s545.
The use of molecular sequence data has increased interest in trying to date evolutionary events, with researchers wanting both an estimate of the divergence time and a confidence interval for that estimate. However, two methodological issues have recently been raised with respect to precision of the estimates: (i) the time of the ancestral event is over-estimated; and (ii) the confidence interval is asymmetrical. I argue that if the estimates of divergence time are considered to be samples from a lognormal probability distribution, then this would explain both of these problems. This implies that divergence times should be presented using geometric means rather than arithmetic means, both for estimates and for their confidence intervals. I present analyses based on both computer simulations and empirical data to show that this approach is effective for both single-gene and multiple-gene data sets. Treating divergence time as a lognormal variable thus provides a simple unifying framework for dealing with many of the problems associated with the estimation of divergence (and possibly coalescence) times. Use of this approach (based on geometric means) can, unfortunately, lead to very different biological conclusions compared to the currently used calculation methods (based on arithmetic means).
分子序列数据的使用增加了人们对尝试确定进化事件时间的兴趣,研究人员既希望估计分歧时间,又希望对该估计有一个置信区间。然而,最近针对估计的精度提出了两个方法学问题:(i)祖先事件的时间被高估;(ii)置信区间不对称。我认为,如果将分歧时间的估计视为对数正态概率分布的样本,那么这将解释这两个问题。这意味着应该使用几何平均值而不是算术平均值来表示分歧时间,无论是对于估计值还是其置信区间。我提出了基于计算机模拟和实证数据的分析,以表明这种方法对单基因和多基因数据集都有效。因此,将分歧时间视为对数正态变量为处理与分歧(和可能的合并)时间估计相关的许多问题提供了一个简单的统一框架。不幸的是,与当前使用的基于算术平均值的计算方法相比,这种方法(基于几何平均值)可能会导致非常不同的生物学结论。