Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76131, Germany.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac832.
Missing data and incomplete lineage sorting (ILS) are two major obstacles to accurate species tree inference. Gene tree summary methods such as ASTRAL and ASTRID have been developed to account for ILS. However, they can be severely affected by high levels of missing data.
We present Asteroid, a novel algorithm that infers an unrooted species tree from a set of unrooted gene trees. We show on both empirical and simulated datasets that Asteroid is substantially more accurate than ASTRAL and ASTRID for very high proportions (>80%) of missing data. Asteroid is several orders of magnitude faster than ASTRAL for datasets that contain thousands of genes. It offers advanced features such as parallelization, support value computation and support for multi-copy and multifurcating gene trees.
Asteroid is freely available at https://github.com/BenoitMorel/Asteroid.
Supplementary data are available at Bioinformatics online.
缺失数据和不完全谱系排序(ILS)是准确物种树推断的两个主要障碍。基因树汇总方法(如 ASTRAL 和 ASTRID)已经被开发出来以解释 ILS。然而,它们可能会受到高水平缺失数据的严重影响。
我们提出了 Asteroid,这是一种从一组无根基因树推断无根物种树的新算法。我们在经验和模拟数据集上都表明,Asteroid 在缺失数据比例非常高(>80%)时,比 ASTRAL 和 ASTRID 更为准确。对于包含数千个基因的数据集,Asteroid 的速度比 ASTRAL 快几个数量级。它提供了一些高级特性,如并行化、支持值计算以及对多拷贝和多分支基因树的支持。
Asteroid 可在 https://github.com/BenoitMorel/Asteroid 上免费获得。
补充数据可在 Bioinformatics 在线获得。