Department of Statistics, Pennsylvania State University.
Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University.
Genome Biol Evol. 2020 Feb 1;12(2):3977-3995. doi: 10.1093/gbe/evaa022.
Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI's performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.
尽管大型多位点基因组数据集已导致系统发育推断的整体改进,但它们带来了一个新的挑战,即需要解决整个基因组中相互冲突的信号。特别是,已经在许多不同物种中发现的祖先群体结构会扭曲基因树的频率,从而阻碍物种树估计器的性能。在这里,我们开发了一种新的最大似然方法,称为 TASTI(具有祖先结构的分类单元系统发育推断),它可以在这种情况下推断系统发育,并且发现随着输入基因树数量的增加,它的准确性也在增加,这与未针对祖先结构进行调整的方法的相对较差性能形成对比。此外,我们提出了一种超树方法,允许 TASTI 随着输入分类单元数量的增加而在计算上进行扩展。我们使用遗传模拟来评估 TASTI 在三分类和四分类设置中的性能,并展示了 TASTI 在六个物种的非洲热带蚊子数据集上的应用。最后,我们已经在一个开源软件包中实现了 TASTI,以便科学界易于使用。