Department of Computer Engineering, Meybod University, Meybod, Iran.
School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
PLoS One. 2020 Sep 18;15(9):e0227842. doi: 10.1371/journal.pone.0227842. eCollection 2020.
Phylogenetic networks construction is one the most important challenge in phylogenetics. These networks can present complex non-treelike events such as gene flow, horizontal gene transfers, recombination or hybridizations. Among phylogenetic networks, rooted structures are commonly used to represent the evolutionary history of a species set, explicitly. Triplets are well known input for constructing the rooted networks. Obtaining an optimal rooted network that contains all given triplets is main problem in network construction. The optimality criteria include minimizing the level or the number of reticulation nodes. The complexity of this problem is known to be NP-hard. In this research, a new algorithm called Netcombin is introduced to construct approximately an optimal network which is consistent with input triplets. The innovation of this algorithm is based on binarization and expanding processes. The binarization process innovatively uses a measure to construct a binary rooted tree T consistent with the approximately maximum number of input triplets. Then T is expanded using a heuristic function by adding minimum number of edges to obtain final network with the approximately minimum number of reticulation nodes. In order to evaluate the proposed algorithm, Netcombin is compared with four state of the art algorithms, RPNCH, NCHB, TripNet, and SIMPLISTIC. The experimental results on simulated data obtained from biologically generated sequences data indicate that by considering the trade-off between speed and precision, the Netcombin outperforms the others.
系统发育网络构建是系统发育学中最重要的挑战之一。这些网络可以呈现出复杂的非树状事件,如基因流、水平基因转移、重组或杂交。在系统发育网络中,通常使用有根结构来明确表示物种集的进化历史。三节点对是构建有根网络的已知输入。获得包含所有给定三节点对的最优有根网络是网络构建中的主要问题。最优性标准包括最小化水平或融合节点的数量。该问题的复杂性已知为 NP 难问题。在这项研究中,引入了一种名为 Netcombin 的新算法,用于构建与输入三节点对一致的近似最优网络。该算法的创新基于二值化和扩展过程。二值化过程创新性地使用一种度量标准来构建与近似最大数量的输入三节点对一致的二进制有根树 T。然后,通过添加最小数量的边使用启发式函数来扩展 T,以获得具有近似最小数量的融合节点的最终网络。为了评估所提出的算法,将 Netcombin 与四种最先进的算法,RPNCH、NCHB、TripNet 和 SIMPLISTIC 进行了比较。从生物生成序列数据中获得的模拟数据的实验结果表明,通过考虑速度和精度之间的权衡,Netcomb 优于其他算法。