Scornavacca Celine, Weller Mathias
ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
LIGM, Université Gustave Eiffel, CNRS, Paris, France.
Algorithms Mol Biol. 2022 Aug 20;17(1):15. doi: 10.1186/s13015-022-00216-w.
Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the SMALL PARSIMONY problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component).
We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times [Formula: see text], [Formula: see text], and [Formula: see text], respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems.
Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a "good" tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of "agreeing trees" on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network's treewidth as parameter more accessible to this audience.
系统发育重建是当代生物信息学面临的首要挑战之一。现有树形重建算法的一个子任务由小简约问题建模:给定一棵树(T)及其叶节点的字符状态分配,为(T)的内部节点分配状态,以最小化简约得分,即连接具有不同状态节点的(T)的边的数量。虽然这个问题在树上是多项式时间可解的,但如果(T)包含诸如杂交或重组等网状事件,即当(T)是一个网络时,情况就更复杂了。实际上,已经提出了网络上简约得分的三种不同版本,并且它们中的每一个在判定时都是NP难的。现有的参数化算法专注于将可能字符状态的数量(c)与网状事件的数量(每个双连通分量)相结合。
我们考虑输入网络的基础无向图的参数树宽(t),分别给出在大小为(n)的网络上针对简约问题的所有三个版本(略有推广)的动态规划算法,运行时间分别为[公式:见原文]、[公式:见原文]和[公式:见原文]。我们的算法使用一种树宽的表述,这可能有助于为系统发育网络上的其他问题形式化基于树宽的动态规划算法。
我们的算法允许计算三种流行的简约得分,对给定低树宽的系统发育网络上(多状态)字符的进化发展进行建模。我们的结果包含并改进了先前针对所有三个变体已知的算法。虽然我们的结果依赖于给定输入的“良好”树分解,但产生它们的令人鼓舞的理论结果以及实际实现都是公开可用的。我们根据同一组节点上的“一致树”对树分解进行了重新表述。由于这种表述对于为系统发育网络开发算法的研究人员和工程师来说可能更自然,我们希望使这一受众更容易利用输入网络的树宽作为参数。