Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany.
San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0505, USA.
Bioinformatics. 2022 Mar 4;38(6):1741-1742. doi: 10.1093/bioinformatics/btab863.
The assessment of novel phylogenetic models and inference methods is routinely being conducted via experiments on simulated as well as empirical data. When generating synthetic data it is often unclear how to set simulation parameters for the models and generate trees that appropriately reflect empirical model parameter distributions and tree shapes. As a solution, we present and make available a new database called 'RAxML Grove' currently comprising more than 60 000 inferred trees and respective model parameter estimates from fully anonymized empirical datasets that were analyzed using RAxML and RAxML-NG on two web servers. We also describe and make available two simple applications of RAxML Grove to exemplify its usage and highlight its utility for designing realistic simulation studies and analyzing empirical model parameter and tree shape distributions.
RAxML Grove is freely available at https://github.com/angtft/RAxMLGrove.
Supplementary data are available at Bioinformatics online.
新型系统发育模型和推断方法的评估通常通过模拟数据和实际数据的实验来进行。在生成合成数据时,通常不清楚如何为模型设置模拟参数,并生成适当反映实际模型参数分布和树形状的树。作为一种解决方案,我们提出并提供了一个名为“RAxML Grove”的新数据库,该数据库目前包含来自两个网络服务器上使用 RAxML 和 RAxML-NG 分析的完全匿名化实际数据集的超过 60000 个推断树和相应的模型参数估计。我们还描述并提供了 RAxML Grove 的两个简单应用程序,以举例说明其用法,并强调其在设计现实模拟研究和分析实际模型参数和树形状分布方面的效用。
RAxML Grove 可在 https://github.com/angtft/RAxMLGrove 上免费获得。
补充数据可在生物信息学在线获得。