Chor Benny, Snir Sagi
School of Computer Science, Tel-Aviv University, Tel-Aviv 39040, Israel.
Math Biosci. 2007 Aug;208(2):347-58. doi: 10.1016/j.mbs.2006.04.001.
This work deals with symbolic mathematical solutions to maximum likelihood on small phylogenetic trees. Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. In this work, we give general analytic solutions for a family of trees with four taxa, two state characters, under a molecular clock. Previously, analytical solutions were known only for three taxa trees. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. Despite the simplicity of our model, solving ML analytically in it is close to the limit of today's tractability. Four taxa rooted trees have two topologies--the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). Combining the properties of molecular clock fork trees with the Hadamard conjugation, and employing the symbolic algebra software Maple, we derive a number of topology dependent identities. Using these identities, we substantially simplify the system of polynomial equations for the fork. We finally employ the symbolic algebra software to obtain closed form analytic solutions (expressed parametrically in the input data).
这项工作涉及小系统发育树上最大似然法的符号数学解。最大似然法(ML)越来越多地被用作选择进化树的最优性标准,但找到全局最优解是一项艰巨的计算任务。在这项工作中,我们给出了在分子钟假设下,具有四个分类单元、两种状态特征的一类树的一般解析解。此前,仅知道三个分类单元树的解析解。从三个分类单元到四个分类单元的变化导致基础代数系统的复杂性大幅增加,需要新的技术和方法。尽管我们的模型很简单,但在其中解析求解最大似然法已接近当今可处理性的极限。有根的四个分类单元树有两种拓扑结构——叉形(两个各有两个叶子的子树)和梳形(一个有三个叶子的子树,另一个有一个叶子)。结合分子钟叉形树的性质与哈达玛共轭,并使用符号代数软件Maple,我们推导出了一些依赖于拓扑结构的恒等式。利用这些恒等式,我们极大地简化了叉形树的多项式方程组。最后,我们使用符号代数软件获得封闭形式的解析解(以输入数据的参数形式表示)。