Dhar Amrit, Minin Vladimir N
1 Department of Statistics, University of Washington , Seattle, Washington.
2 Department of Biology, University of Washington , Seattle, Washington.
J Comput Biol. 2017 May;24(5):377-399. doi: 10.1089/cmb.2016.0172. Epub 2017 Feb 8.
Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
随机映射是一种基于模拟的方法,用于根据连续时间马尔可夫进化模型将替代历史概率性地映射到系统发育树上。该技术可用于推断系统发育树上进化过程的属性,并且与基于简约法的映射不同,它基于观测数据进行条件设定,以随机绘制替代映射,这些映射不一定要求树上的事件数量最少。大多数随机映射应用仅模拟替代映射,以估计两种常用映射汇总的均值和/或方差:特定类型替代的数量(标记为替代计数)以及在树上特定状态组中花费的时间(标记为停留时间)。存在用于计算随机映射汇总均值的快速、无需模拟的算法。重要的是,这些算法在系统发育树的末端/叶子数量上呈线性扩展。然而,据我们所知,不存在用于计算随机映射汇总高阶矩的此类算法。我们提出了一种这样的无需模拟的动态规划算法,该算法可计算先验和后验映射方差,并且在系统发育树末端数量上呈线性扩展。我们的方法提出了一个通用框架,可用于在无需模拟的情况下有效计算随机映射汇总的高阶矩。我们通过扩展先前开发的用于跨位点速率变化和检测基因组序列中进化保守区域的统计检验,证明了我们算法的实用性。