Department of Mathematics, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA.
Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Drive, Madison, WI, 53706-1388, USA.
Bull Math Biol. 2024 Jul 12;86(9):106. doi: 10.1007/s11538-024-01340-x.
Maximum likelihood estimation is among the most widely-used methods for inferring phylogenetic trees from sequence data. This paper solves the problem of computing solutions to the maximum likelihood problem for 3-leaf trees under the 2-state symmetric mutation model (CFN model). Our main result is a closed-form solution to the maximum likelihood problem for unrooted 3-leaf trees, given generic data; this result characterizes all of the ways that a maximum likelihood estimate can fail to exist for generic data and provides theoretical validation for predictions made in Parks and Goldman (Syst Biol 63(5):798-811, 2014). Our proof makes use of both classical tools for studying group-based phylogenetic models such as Hadamard conjugation and reparameterization in terms of Fourier coordinates, as well as more recent results concerning the semi-algebraic constraints of the CFN model. To be able to put these into practice, we also give a complete characterization to test genericity.
最大似然估计是从序列数据推断系统发育树最广泛使用的方法之一。本文解决了在 2 状态对称突变模型(CFN 模型)下计算 3 叶树最大似然问题解的问题。我们的主要结果是给定通用数据时,无根 3 叶树最大似然问题的闭式解;该结果描述了通用数据中最大似然估计不存在的所有方式,并为 Parks 和 Goldman(Syst Biol 63(5):798-811, 2014)中提出的预测提供了理论验证。我们的证明既利用了研究基于群的系统发育模型的经典工具,如 Hadamard 共轭和傅里叶坐标下的重参数化,也利用了有关 CFN 模型半代数约束的最新结果。为了能够将这些付诸实践,我们还给出了一个完整的特征来测试通用性。