IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1738-1747. doi: 10.1109/TCBB.2017.2757930. Epub 2017 Sep 29.
Species tree reconstruction from genomic data is increasingly performed using methods that account for sources of gene tree discordance such as incomplete lineage sorting. One popular method for reconstructing species trees from unrooted gene tree topologies is ASTRAL. In this paper, we derive theoretical sample complexity results for the number of genes required by ASTRAL to guarantee reconstruction of the correct species tree with high probability. We also validate those theoretical bounds in a simulation study. Our results indicate that ASTRAL requires gene trees to reconstruct the species tree correctly with high probability where is the number of species and is the length of the shortest branch in the species tree. Our simulations, some under the anomaly zone, show trends consistent with the theoretical bounds and also provide some practical insights on the conditions where ASTRAL works well.
从基因组数据中重建物种树越来越多地使用考虑基因树分歧来源(如不完全谱系分选)的方法来进行。一种用于从无根基因树拓扑结构重建物种树的流行方法是 ASTRAL。在本文中,我们推导出了 ASTRAL 重建正确物种树的所需基因数量的理论样本复杂度结果,具有高概率。我们还在模拟研究中验证了这些理论界限。我们的结果表明,ASTRAL 需要基因树以高概率正确重建物种树,其中是物种数量,是物种树中最短分支的长度。我们的模拟实验,其中一些在异常区域内,显示出与理论界限一致的趋势,并且还提供了有关 ASTRAL 良好工作条件的一些实际见解。