Briand Samuel, Dessimoz Christophe, El-Mabrouk Nadia, Lafond Manuel, Lobinska Gabriela
Computer Science Department, Université de Montréal, Montreal, Canada.
Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
BMC Genomics. 2020 Nov 18;21(Suppl 10):779. doi: 10.1186/s12864-020-07011-0.
The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc).
We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees.
We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .
罗宾逊 - 福尔兹(RF)距离是系统发育树之间一种成熟的度量方法。尽管缺乏生物学依据,但它具有是一种恰当度量且可在线性时间内计算的优点。然而,对于涉及基因的系统发育应用,RF度量忽略的系统发育树的一个关键方面是分支事件的类型(例如物种形成、基因复制、基因转移等)。
我们通过纳入节点翻转操作以及边收缩和边扩展,将RF扩展到具有标记内部节点的树。我们在二元标记的情况下探索这种扩展的RF距离的性质。特别地,我们表明与未标记的情况相反,最优编辑路径可能需要收缩“好”边,即两棵树之间共享的边。
我们提供了一种2近似算法,经实证表明其性能良好。展望未来,计算标记树之间的距离开辟了各种新的算法方向。实现和模拟可在https://github.com/DessimozLab/pylabeledrf获取。