• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于多最长公共子序列(MLCS)问题的新型高效图模型。

A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem.

作者信息

Peng Zhan, Wang Yuping

机构信息

School of Computer Science and Technology, Xidian UniversityXi'an, China.

出版信息

Front Genet. 2017 Aug 9;8:104. doi: 10.3389/fgene.2017.00104. eCollection 2017.

DOI:10.3389/fgene.2017.00104
PMID:28848600
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5552671/
Abstract

Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.

摘要

寻找多个序列的多个最长公共子序列(MLCS)是一个经典的NP难问题,已在许多应用中得到应用。解决MLCS问题最有效的精确方法之一是基于支配点图,它是一种有向无环图(DAG)。然而,基于支配点图的主流方法的时间和空间效率仍然不尽人意:构建这些方法所使用的支配点图需要大量的时间和空间,这阻碍了这些方法在大规模和长序列中的应用。为了解决这个问题,在本文中,我们针对MLCS问题提出了一种新的时空高效图模型——分层有向无环图(Leveled-DAG)。分层有向无环图在构建过程中可以及时消除图中所有对构建MLCS没有贡献的节点。在任何时刻,只需要在内存中保留图中的当前层和一些先前生成的节点,这可以大大减少内存消耗。此外,最终的图只包含一个节点,所有需要的MLCS都保存在该节点中,因此,无需进行额外的搜索MLCS的操作。分别对具有不同数量和长度的真实生物序列进行了实验,并将所提出的算法与三种最先进的算法进行了比较。实验结果表明,分层有向无环图方法所需的时间和空间比比较算法要小,特别是在大规模和长序列上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/4748f866398e/fgene-08-00104-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/11dfc730fbf9/fgene-08-00104-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/44a482dd6c13/fgene-08-00104-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/03b31c59aa5c/fgene-08-00104-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/4748f866398e/fgene-08-00104-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/11dfc730fbf9/fgene-08-00104-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/44a482dd6c13/fgene-08-00104-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/03b31c59aa5c/fgene-08-00104-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fe0/5552671/4748f866398e/fgene-08-00104-g0004.jpg

相似文献

1
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem.一种用于多最长公共子序列(MLCS)问题的新型高效图模型。
Front Genet. 2017 Aug 9;8:104. doi: 10.3389/fgene.2017.00104. eCollection 2017.
2
A path recorder algorithm for Multiple Longest Common Subsequences (MLCS) problems.一种用于多重最长公共子序列(MLCS)问题的路径记录算法。
Bioinformatics. 2020 May 1;36(10):3035-3042. doi: 10.1093/bioinformatics/btaa134.
3
A fast and efficient path elimination algorithm for large-scale multiple common longest sequence problems.一种用于大规模多公共最长序列问题的快速高效路径消除算法。
BMC Bioinformatics. 2022 Sep 7;23(1):366. doi: 10.1186/s12859-022-04906-5.
4
A fast and memory efficient MLCS algorithm by character merging for DNA sequences alignment.一种基于字符合并的快速且节省内存的 DNA 序列比对 MLCS 算法。
Bioinformatics. 2020 Feb 15;36(4):1066-1073. doi: 10.1093/bioinformatics/btz725.
5
dwMLCS: An Efficient MLCS Algorithm Based on Dynamic and Weighted Directed Acyclic Graph.dwMLCS:一种基于动态加权有向无环图的高效最大公共子序列算法。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1987-1999. doi: 10.1109/TCBB.2024.3431558. Epub 2024 Dec 10.
6
A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem.一种针对多重最长公共子序列问题的空间受限随时算法。
IEEE Trans Knowl Data Eng. 2014 Nov;26(11):2599-2609. doi: 10.1109/TKDE.2014.2304464.
7
New Construction of Family of MLCS Algorithms.新的 MLCS 算法族构建。
J Healthc Eng. 2021 Jan 19;2021:6636710. doi: 10.1155/2021/6636710. eCollection 2021.
8
Deposition and extension approach to find longest common subsequence for thousands of long sequences.沉积和扩展方法来寻找数千个长序列的最长公共子序列。
Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.
9
Algorithms for the Uniqueness of the Longest Common Subsequence.最长公共子序列唯一性算法。
J Bioinform Comput Biol. 2023 Dec;21(6):2350027. doi: 10.1142/S0219720023500270. Epub 2024 Jan 10.
10
A new algorithm for "the LCS problem" with application in compressing genome resequencing data.一种应用于压缩基因组重测序数据的“最长公共子序列问题”新算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):544. doi: 10.1186/s12864-016-2793-0.

引用本文的文献

1
Using Sequence Analyses to Quantitatively Measure Oropharyngeal Swallowing Temporality in Point-of-Care Ultrasound Examinations: A Pilot Study.在床旁超声检查中使用序列分析定量测量口咽吞咽时间:一项初步研究。
J Clin Med. 2024 Apr 15;13(8):2288. doi: 10.3390/jcm13082288.
2
A fast and efficient path elimination algorithm for large-scale multiple common longest sequence problems.一种用于大规模多公共最长序列问题的快速高效路径消除算法。
BMC Bioinformatics. 2022 Sep 7;23(1):366. doi: 10.1186/s12859-022-04906-5.
3
New Construction of Family of MLCS Algorithms.

本文引用的文献

1
Next-Generation Sequencing of Circulating Tumor DNA for Early Cancer Detection.循环肿瘤 DNA 的下一代测序用于早期癌症检测。
Cell. 2017 Feb 9;168(4):571-574. doi: 10.1016/j.cell.2017.01.030.
2
A Space-Bounded Anytime Algorithm for the Multiple Longest Common Subsequence Problem.一种针对多重最长公共子序列问题的空间受限随时算法。
IEEE Trans Knowl Data Eng. 2014 Nov;26(11):2599-2609. doi: 10.1109/TKDE.2014.2304464.
3
A fast parallel algorithm for finding the longest common sequence of multiple biosequences.一种用于寻找多个生物序列最长公共序列的快速并行算法。
新的 MLCS 算法族构建。
J Healthc Eng. 2021 Jan 19;2021:6636710. doi: 10.1155/2021/6636710. eCollection 2021.
4
A multi-level hypoglycemia early alarm system based on sequence pattern mining.基于序列模式挖掘的多级低血糖早期报警系统。
BMC Med Inform Decis Mak. 2021 Jan 21;21(1):22. doi: 10.1186/s12911-021-01389-x.
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2105-7-S4-S4.
4
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
5
Matching sequences under deletion-insertion constraints.在缺失-插入约束下匹配序列。
Proc Natl Acad Sci U S A. 1972 Jan;69(1):4-6. doi: 10.1073/pnas.69.1.4.