Computer Science Department, Rice University, Houston, TX.
Mol Biol Evol. 2020 Jun 1;37(6):1809-1818. doi: 10.1093/molbev/msaa045.
Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.
从多基因座数据推断种系发生树已成为后基因组时代的一种强大范例,无论是在产生的种系发生树的准确性方面,还是在阐明塑造进化历史的过程方面。在这个领域,贝叶斯种系发生树推断方法是理想的,因为它们不仅能够产生准确的估计,而且还能够自然地为这些估计提供置信度的度量。然而,贝叶斯推断的繁重计算要求限制了这些方法在非常小的数据集上的适用性。在本文中,我们表明,通过在随机漫步过程中限制所探索的基因树的空间,而不牺牲各种度量标准所衡量的准确性,实际上可以提高多物种合并模型下贝叶斯推断的计算效率。这个想法是首先以未解决的基因树的形式推断出各个基因座的树的约束,然后限制采样器只考虑受约束树的分辨率。我们在模拟和生物数据上展示了这种方法所获得的改进。