Duchêne Sebastián, Geoghegan Jemma L, Holmes Edward C, Ho Simon Y W
Marie Bashir Institute of Infectious Diseases and Biosecurity, Charles Perkins Centre, Sydney Medical School, University of Sydney, Sydney, NSW 2006, Australia.
School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia.
Bioinformatics. 2016 Nov 15;32(22):3375-3379. doi: 10.1093/bioinformatics/btw421. Epub 2016 Jul 13.
In rapidly evolving pathogens, including viruses and some bacteria, genetic change can accumulate over short time-frames. Accordingly, their sampling times can be used to calibrate molecular clocks, allowing estimation of evolutionary rates. Methods for estimating rates from time-structured data vary in how they treat phylogenetic uncertainty and rate variation among lineages. We compiled 81 virus data sets and estimated nucleotide substitution rates using root-to-tip regression, least-squares dating and Bayesian inference.
Although estimates from these three methods were often congruent, this largely relied on the choice of clock model. In particular, relaxed-clock models tended to produce higher rate estimates than methods that assume constant rates. Discrepancies in rate estimates were also associated with high among-lineage rate variation, and phylogenetic and temporal clustering. These results provide insights into the factors that affect the reliability of rate estimates from time-structured sequence data, emphasizing the importance of clock-model testing.
sduchene@unimelb.edu.au or garzonsebastian@hotmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
在快速进化的病原体中,包括病毒和一些细菌,遗传变化可在短时间内积累。因此,它们的采样时间可用于校准分子钟,从而估计进化速率。从时间结构数据估计速率的方法在处理系统发育不确定性和谱系间速率变化的方式上有所不同。我们汇编了81个病毒数据集,并使用根到末梢回归、最小二乘定年法和贝叶斯推断来估计核苷酸替代率。
尽管这三种方法的估计结果通常一致,但这在很大程度上依赖于时钟模型的选择。特别是,与假设速率恒定的方法相比,宽松时钟模型往往会产生更高的速率估计值。速率估计的差异还与谱系间的高速率变化以及系统发育和时间聚类有关。这些结果为影响从时间结构序列数据估计速率可靠性的因素提供了见解,强调了时钟模型测试的重要性。
sduchene@unimelb.edu.au或garzonsebastian@hotmail.com
补充数据可在《生物信息学》在线获取。