Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.
Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA.
J Comput Biol. 2023 Nov;30(11):1146-1181. doi: 10.1089/cmb.2023.0185. Epub 2023 Oct 30.
We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
我们解决了在多物种合并(MSC)模型下,给定一组无根基因树,为无树根物种树定位的问题。五元组定位(QR)是最近提出的一种解决这个问题的多项式时间算法,它基于 Allman、Degnan 和 Rhodes 发展的理论,证明了在 MSC 下从无根基因树中可识别有根的 5 分类群树。然而,尽管 QR 在模拟中具有很好的准确性,但它的统计一致性仍是一个悬而未决的问题。我们提出了 QR-STAR,QR 的一个变体,它有一个额外的步骤和一个不同的代价函数,并证明它在 MSC 下是统计一致的。此外,我们为 QR-STAR 推导出了样本复杂度界,并表明基于“短五元组”的一个特殊变体具有多项式样本复杂度。最后,我们在各种模型条件下的模拟研究表明,QR-STAR 与 QR 的准确性相匹配或有所提高。QR-STAR 可在 github 上以开源形式获得。