Suppr超能文献

系统发育树上的生死先验与快速约会。

Birth-death prior on phylogeny and speed dating.

作者信息

Akerborg Orjan, Sennblad Bengt, Lagergren Jens

机构信息

Stockholm Bioinformatics Centre, Albanova, Stockholm University, SE-10691 Stockholm, Sweden.

出版信息

BMC Evol Biol. 2008 Mar 4;8:77. doi: 10.1186/1471-2148-8-77.

Abstract

BACKGROUND

In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies.

RESULTS

We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on.

CONCLUSION

Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.

摘要

背景

近年来,出现了一种不再严格依赖分子钟来推断物种形成及其他进化事件时间的趋势。明确地对替换率和分歧时间进行建模,使得为分支长度制定信息丰富的先验分布成为可能。已经提出了在树分支上具有生灭先验以及谱系间自相关或独立同分布替换率的模型,从而能够同时推断替换率和分歧时间。然而,这个问题主要是在马尔可夫链蒙特卡罗(MCMC)框架下进行分析的,当应用于大型系统发育树时,这种方法需要数小时甚至数天的计算时间。

结果

我们证明,对MCMC方案进行爬山最大后验(MAP)调整可显著提高计算效率。我们还证明,一种用于分支长度分解的新型动态规划(DP)算法,在爬山和MCMC设置中都很有用,进一步减少了计算时间。对于在固定树上推断速率和时间参数的问题,我们进行了模拟,在植物rbcL基因数据集上对爬山法和MCMC进行了比较,并在动物线粒体DNA数据集上进行了年代测定分析,结果表明我们的方法能够对非常大的树进行高效、高精度的分析。使用MCMC需要数天计算时间的数据集,通过我们的MAP算法可以在不到一分钟的时间内准确分析。从我们的示例分析结果来看,我们得出结论,我们的方法通常能避免过早陷入局部最优。对于那些仍然可能出现这个问题的情况,例如当我们除了参数之外还推断树拓扑结构时,我们表明可以通过使用一种类似模拟退火(SAL)的方法来规避这个问题,在这种方法中,我们在推断早期倾向于树的交换,而在后期将重点偏向于速率和时间参数的变化。

结论

我们的贡献为核苷酸序列数据的快速准确年代测定分析开辟了道路。分别对分支替换率和分歧时间进行建模,使我们能够在不假设分子钟的情况下纳入关于时间的生灭先验。该方法很容易调整以考虑化石记录中的数据,并且可以与广泛的速率和替换模型一起使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1492/2270800/6259adc903d4/1471-2148-8-77-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验