Peng Zhan, Wang Yuping
School of Computer Science and Technology, Xidian UniversityXi'an, China.
Front Genet. 2017 Aug 9;8:104. doi: 10.3389/fgene.2017.00104. eCollection 2017.
Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.
寻找多个序列的多个最长公共子序列(MLCS)是一个经典的NP难问题,已在许多应用中得到应用。解决MLCS问题最有效的精确方法之一是基于支配点图,它是一种有向无环图(DAG)。然而,基于支配点图的主流方法的时间和空间效率仍然不尽人意:构建这些方法所使用的支配点图需要大量的时间和空间,这阻碍了这些方法在大规模和长序列中的应用。为了解决这个问题,在本文中,我们针对MLCS问题提出了一种新的时空高效图模型——分层有向无环图(Leveled-DAG)。分层有向无环图在构建过程中可以及时消除图中所有对构建MLCS没有贡献的节点。在任何时刻,只需要在内存中保留图中的当前层和一些先前生成的节点,这可以大大减少内存消耗。此外,最终的图只包含一个节点,所有需要的MLCS都保存在该节点中,因此,无需进行额外的搜索MLCS的操作。分别对具有不同数量和长度的真实生物序列进行了实验,并将所提出的算法与三种最先进的算法进行了比较。实验结果表明,分层有向无环图方法所需的时间和空间比比较算法要小,特别是在大规模和长序列上。