Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T1Z4, Canada.
Department of Zoology, University of British Columbia, Vancouver, BC, V6T1Z4, Canada.
Bioinformatics. 2018 Mar 15;34(6):1053-1055. doi: 10.1093/bioinformatics/btx701.
Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable.
Here we present a new R package, named 'castor', for comparative phylogenetics on large trees comprising millions of tips. On large trees castor is often 100-1000 times faster than existing tools.
The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN).
Supplementary data are available at Bioinformatics online.
生物多样性数据库现在包含数十万条序列和特征记录。例如,开放生命树包括超过 149.1 万后生动物和超过 30 万细菌分类群。这些数据为分析系统发育特征分布和重建祖先生物多样性提供了独特的机会。然而,现有的比较系统发生学工具对于如此大的树规模扩展不佳,几乎无法使用。
这里我们提出了一个新的 R 包,名为“castor”,用于对包含数百万个末梢的大型树进行比较系统发生学分析。在大型树上,castor 的速度通常比现有工具快 100-1000 倍。
castor 的源代码、编译二进制文件、文档和使用示例可在综合 R 档案网络(CRAN)上免费获得。
补充数据可在生物信息学在线获得。