进化树的最大似然性：难度与近似性

Maximum likelihood of evolutionary trees: hardness and approximation.

作者信息

Chor Benny, Tuller Tamir

机构信息

School of Computer Science, Tel-Aviv University Tel-Aviv, Israel.

出版信息

Bioinformatics. 2005 Jun;21 Suppl 1:i97-106. doi: 10.1093/bioinformatics/bti1027.

DOI:10.1093/bioinformatics/bti1027

PMID:15961504

Abstract

MOTIVATION

Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result.

RESULTS

(1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it.

摘要

动机

最大似然法（ML）是一种在选择进化树时越来越受欢迎的最优性标准。然而，最大似然法的计算复杂度在长达20多年的时间里一直悬而未决，直到最近作者才解决了替换的Jukes-Cantor模型及其推广形式的该问题。结果表明，重建最大似然树在计算上是难以处理的（NP难问题）。在这项工作中，我们探索了三个方向，扩展了这一结果。

结果

（1）我们表明，在分子钟假设下的最大似然法在计算上仍然难以处理（NP难问题）。（2）我们表明，不仅找到精确的最大似然树在计算上是难以处理的，而且对于任何小于1.00175的乘法因子，近似最大似然的对数在计算上也是难以处理的。（3）我们开发了一种在输入序列稀疏的条件下近似对数似然的算法。它采用任何简约性近似算法，并渐近地达到相同的近似比率。我们注意到，在这种条件下，稀疏输入的最大似然重建仍然很困难，而且许多真实数据集都满足这一条件。