Pasqualin Diego, Barbeitos Marcos, Silva Fabiano
Departmento de Informática, Universidade Federal do Paraná, Caixa Postal 19081, Curitiba, PR, 81531-980, Brazil.
Departmento de Zoologia, Universidade Federal do Paraná, Caixa Postal 19020, Curitiba, PR, 81531-990, Brazil.
BMC Bioinformatics. 2017 Feb 22;18(1):123. doi: 10.1186/s12859-017-1554-7.
Stochastic mapping is frequently used in comparative biology to simulate character evolution, enabling the probabilistic computation of statistics such as number of state transitions along a tree and distribution of states in its internal nodes. Common implementations rely on Continuous-time Markov Chain simulations whose parameters are difficult to adjust and subjected to inherent inaccuracy. Thus, researchers must run a large number of simulations in order to obtain adequate estimates. Although execution time tends to be relatively small when simulations are performed on a single tree assumed to be the "true" topology, it may become an issue if analyses are conducted on several trees, such as the ones that make up posterior distributions obtained via Bayesian phylogenetic inference. Working with such distributions is preferable to working with a single tree, for they allow the integration of phylogenetic uncertainty into parameter estimation. In such cases, detailed character mapping becomes less important than parameter integration across topologies. Here, we present an R-based implementation (SFREEMAP) of an analytical approach to obtain accurate, per-branch expectations of numbers of state transitions and dwelling times. We also introduce an intuitive way of visualizing the results by integrating over the posterior distribution and summarizing the parameters onto a target reference topology (such as a consensus or MAP tree) provided by the user.
We benchmarked SFREEMAP's performance against make.simmap, a popular R-based implementation of stochastic mapping. SFREEMAP confirmed theoretical expectations outperforming make.simmap in every experiment and reducing computation time of relatively modest datasets from hours to minutes. We have also demonstrated that SFREEMAP returns estimates which were not only similar to the ones obtained by averaging across make.simmap mappings, but also more accurate, according to simulated data. We illustrate our visualization strategy using previously published data on the evolution of coloniality in scleractinian corals.
SFREEMAP is an accurate and fast alternative to ancestral state reconstruction via simulation-based stochastic mapping.
随机映射在比较生物学中经常用于模拟性状进化,能够对诸如沿树的状态转换数量和内部节点状态分布等统计量进行概率计算。常见的实现依赖于连续时间马尔可夫链模拟,其参数难以调整且存在固有误差。因此,研究人员必须运行大量模拟才能获得足够的估计值。虽然在假定为“真实”拓扑结构的单棵树上进行模拟时执行时间往往相对较短,但如果对多棵树进行分析,比如对通过贝叶斯系统发育推断获得的后验分布中的树进行分析,执行时间可能会成为一个问题。处理这样的分布比处理单棵树更可取,因为它们允许将系统发育不确定性纳入参数估计。在这种情况下,详细的性状映射不如跨拓扑结构的参数整合重要。在此,我们展示了一种基于R语言的实现方法(SFREEMAP),该方法采用一种分析方法来准确获得每个分支的状态转换数量和驻留时间的期望值。我们还引入了一种直观的结果可视化方法,即通过对后验分布进行积分并将参数汇总到用户提供的目标参考拓扑结构(如一致树或最大后验概率树)上。
我们将SFREEMAP的性能与make.simmap(一种流行的基于R语言的随机映射实现方法)进行了基准测试。SFREEMAP证实了理论预期,在每个实验中均优于make.simmap,并将相对较小数据集的计算时间从数小时缩短至数分钟。我们还证明,根据模拟数据,SFREEMAP返回的估计值不仅与通过对make.simmap映射进行平均得到的估计值相似,而且更准确。我们使用先前发表的关于石珊瑚群体形成进化的数据来说明我们的可视化策略。
SFREEMAP是一种通过基于模拟的随机映射进行祖先状态重建的准确且快速的替代方法。