Kannan Lavanya, Wheeler Ward C
Division of Invertebrate Zoology and Richard Gilder Graduate School, American Museum of Natural History, New York, NY - 10024, USA.
Algorithms Mol Biol. 2012 May 2;7(1):9. doi: 10.1186/1748-7188-7-9.
Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.
In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.
The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.
系统发育网络是系统发育树的推广,用于在各种情况下对进化事件进行建模。已经引入了几种不同的方法和标准来重建系统发育树。最大简约法是一种基于特征的方法,通过最小化解释分配在叶节点上的给定数据集所需的进化步骤总数来推断系统发育树。过去已经引入了在系统发育树上优化简约得分的精确解决方案。
在本文中,我们将网络上的简约得分定义为沿着网络所有边的替换成本之和;并表明某些在树上计算最优简约得分的著名算法,如 Sankoff 算法和 Fitch 算法,自然地扩展到网络,除非在网状顶点处存在冲突的分配。我们提供了在网络上找到最优简约得分的启发式方法。我们的算法可以应用于任何成本矩阵,该矩阵可能包含沿着网络不同边在不同字符之间转换的不等替换成本。我们针对最多有 2 个网状结构且叶节点数为 10 个或更少的实验数据进行了分析,发现对于几乎所有网络,启发式方法返回的边界与通过穷举确定的最优简约得分相匹配。
我们在此定义的简约得分并不直接反映网络中显示特征进化的最佳树的成本。然而,在寻找描述一组特征的最简约网络时,有必要添加额外的成本考虑因素,以偏好更简单的结构,如树而非网络。我们在此描述的网络上的简约得分除了考虑网状顶点引入的所有分支模式共有的其他边的替换成本外,还考虑了与每个网状顶点相关的额外边的替换成本。因此,该得分包含了网络中网状顶点数量的内在成本,并将提供一个在所有网络之间具有可比性的标准。尽管在网络上找到简约得分的问题被认为在计算上难以解决,但此处描述的启发式方法将有助于我们寻找最简约网络的努力。