IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1077-1090. doi: 10.1109/TCBB.2017.2710342. Epub 2017 Jun 1.
Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4,700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
复制-转移-缺失(DTL)协调是一种在存在水平基因转移的情况下研究基因家族进化的强大方法。DTL 协调通过假设物种形成、复制、转移和缺失事件来协调基因树与物种树。当基因树为二进制时,存在用于寻找最优 DTL 协调的有效算法。然而,在实践中,由于基因树拓扑结构的不确定性,基因树通常是非二进制的,并且已知 DTL 与非二进制基因树的协调是 NP 难的。在本文中,我们提出了用于非二进制基因树的 DTL 协调的第一个精确算法。具体来说,我们 (i) 表明,非二进制基因树的 DTL 协调问题在基因树的最大度数上是固定参数可解的,(ii) 提出了一种指数时间但在实践中有效的算法来跟踪和枚举非二进制输入基因树的所有最优二进制分辨率,以及 (iii) 将我们的算法应用于来自 100 个物种的超过 4700 个基因树的大型经验数据集,以研究基因树不确定性对 DTL 协调的影响,并展示我们的算法的适用性和实用性。本文引入的新技术和算法将帮助生物学家避免由于基因树不确定性导致的不正确的进化推断。