Bayzid Md Shamsuzzoha, Hunt Tyler, Warnow Tandy
BMC Genomics. 2014;15 Suppl 6(Suppl 6):S7. doi: 10.1186/1471-2164-15-S6-S7. Epub 2014 Oct 17.
With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows.
We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.
随着新测序基因组的快速增长,从多个基因推断物种树已成为比较生物学和进化生物学中的一项基本生物信息学任务。然而,在存在基因树不一致的情况下,准确估计物种树是困难的,基因树不一致通常是由于不完全谱系分选(ILS)造成的,由多物种合并模型模拟。在过去十年中,已经开发了几种高度准确的基于合并的物种树估计方法,包括MP-EST。然而,随着物种数量的增加,MP-EST的运行时间会迅速增加。
我们提出了分治技术,提高了MP-EST的可扩展性,使其能够在大型数据集上高效运行。令人惊讶的是,正如我们在一系列模拟和生物数据集上的研究所表明的,这种技术还提高了MP-EST估计的物种树的准确性。