Suppr超能文献

在线性时间内计算系统发育随机映射摘要的高阶矩

Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.

作者信息

Dhar Amrit, Minin Vladimir N

机构信息

1 Department of Statistics, University of Washington , Seattle, Washington.

2 Department of Biology, University of Washington , Seattle, Washington.

出版信息

J Comput Biol. 2017 May;24(5):377-399. doi: 10.1089/cmb.2016.0172. Epub 2017 Feb 8.

Abstract

Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.

摘要

随机映射是一种基于模拟的方法,用于根据连续时间马尔可夫进化模型将替代历史概率性地映射到系统发育树上。该技术可用于推断系统发育树上进化过程的属性,并且与基于简约法的映射不同,它基于观测数据进行条件设定,以随机绘制替代映射,这些映射不一定要求树上的事件数量最少。大多数随机映射应用仅模拟替代映射,以估计两种常用映射汇总的均值和/或方差:特定类型替代的数量(标记为替代计数)以及在树上特定状态组中花费的时间(标记为停留时间)。存在用于计算随机映射汇总均值的快速、无需模拟的算法。重要的是,这些算法在系统发育树的末端/叶子数量上呈线性扩展。然而,据我们所知,不存在用于计算随机映射汇总高阶矩的此类算法。我们提出了一种这样的无需模拟的动态规划算法,该算法可计算先验和后验映射方差,并且在系统发育树末端数量上呈线性扩展。我们的方法提出了一个通用框架,可用于在无需模拟的情况下有效计算随机映射汇总的高阶矩。我们通过扩展先前开发的用于跨位点速率变化和检测基因组序列中进化保守区域的统计检验,证明了我们算法的实用性。

相似文献

1
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.
J Comput Biol. 2017 May;24(5):377-399. doi: 10.1089/cmb.2016.0172. Epub 2017 Feb 8.
2
Fast, accurate and simulation-free stochastic mapping.
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3985-95. doi: 10.1098/rstb.2008.0176.
3
Bayesian coestimation of phylogeny and sequence alignment.
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
4
Evolutionary triplet models of structured RNA.
PLoS Comput Biol. 2009 Aug;5(8):e1000483. doi: 10.1371/journal.pcbi.1000483. Epub 2009 Aug 28.
5
SFREEMAP - A simulation-free tool for stochastic mapping.
BMC Bioinformatics. 2017 Feb 22;18(1):123. doi: 10.1186/s12859-017-1554-7.
6
Mapping mutations on phylogenies.
Syst Biol. 2002 Oct;51(5):729-39. doi: 10.1080/10635150290102393.
7
Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models.
Bioinformatics. 2008 Jan 1;24(1):56-62. doi: 10.1093/bioinformatics/btm532. Epub 2007 Nov 14.
10
An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees.
J Comput Biol. 2003;10(6):869-89. doi: 10.1089/106652703322756122.

引用本文的文献

本文引用的文献

1
A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection.
Bioinformatics. 2012 Dec 15;28(24):3248-56. doi: 10.1093/bioinformatics/bts580. Epub 2012 Oct 12.
2
Hessian calculation for phylogenetic likelihood based on the pruning algorithm and its applications.
Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 14. doi: 10.1515/1544-6115.1779.
3
Among-site rate variation and its impact on phylogenetic analyses.
Trends Ecol Evol. 1996 Sep;11(9):367-72. doi: 10.1016/0169-5347(96)10041-0.
4
New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.
Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.
5
Detection of nonneutral substitution rates on mammalian phylogenies.
Genome Res. 2010 Jan;20(1):110-21. doi: 10.1101/gr.097857.109. Epub 2009 Oct 26.
6
Fast, accurate and simulation-free stochastic mapping.
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3985-95. doi: 10.1098/rstb.2008.0176.
7
Counting labeled transitions in continuous-time Markov models of evolution.
J Math Biol. 2008 Mar;56(3):391-412. doi: 10.1007/s00285-007-0120-8. Epub 2007 Sep 14.
10
Thermodynamics of neutral protein evolution.
Genetics. 2007 Jan;175(1):255-66. doi: 10.1534/genetics.106.061754. Epub 2006 Nov 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验