• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为k个字符串索引最大公共子序列

: indexing maximal common subsequences for k strings.

作者信息

Buzzega Giovanni, Conte Alessio, Grossi Roberto, Punzi Giulia

机构信息

Dipartimento di Informatica, Università di Pisa, Largo Pontecorvo 3, 56127, Pisa, Italy.

出版信息

Algorithms Mol Biol. 2025 Apr 19;20(1):6. doi: 10.1186/s13015-025-00271-z.

DOI:10.1186/s13015-025-00271-z
PMID:40253370
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12008955/
Abstract

Analyzing and comparing sequences of symbols is among the most fundamental problems in computer science, possibly even more so in bioinformatics. Maximal Common Subsequences (MCSs), i.e., inclusion-maximal sequences of non-contiguous symbols common to two or more strings, have only recently received attention in this area, despite being a basic notion and a natural generalization of more common tools like Longest Common Substrings/Subsequences. In this paper we simplify and engineer recent advancements in MCSs into a practical tool called , the first publicly available tool that can index MCSs of real genomic data, and show that its definition can be generalized to multiple strings. We demonstrate that our tool can index pairs of sequences exceeding 10,000 base pairs within minutes, utilizing only 4-7% more than the minimum required nodes. For three or more sequences, we observe experimentally that the minimum index may exhibit a significant increase in the number of nodes.

摘要

分析和比较符号序列是计算机科学中最基本的问题之一,在生物信息学中可能更是如此。最大公共子序列(MCS),即两个或多个字符串共有的非连续符号的包含最大序列,尽管它是一个基本概念,并且是诸如最长公共子串/子序列等更常见工具的自然推广,但直到最近才在该领域受到关注。在本文中,我们将MCS的最新进展简化并设计成一个名为 的实用工具,这是第一个可公开获得的能够对真实基因组数据的MCS进行索引的工具,并表明其定义可以推广到多个字符串。我们证明,我们的工具可以在几分钟内对超过10,000个碱基对的序列对进行索引,使用的节点仅比所需的最少节点多4 - 7%。对于三个或更多序列,我们通过实验观察到,最小索引可能会使节点数量显著增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/5408cec9b1e8/13015_2025_271_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/80bcc15913ca/13015_2025_271_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/bd6e88029cb5/13015_2025_271_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/050444b2b32b/13015_2025_271_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/cda61149ee26/13015_2025_271_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3f70d82cd06c/13015_2025_271_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/d9dcdd953e0a/13015_2025_271_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3264ebf83097/13015_2025_271_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/dc39156f5b94/13015_2025_271_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3e46a73eb7b7/13015_2025_271_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/0395d3c72425/13015_2025_271_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/807601b673ca/13015_2025_271_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/5ce70ed426b2/13015_2025_271_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/8979081de774/13015_2025_271_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/aef77f3c6f0d/13015_2025_271_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/fb19fc680b9f/13015_2025_271_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/d968fab061ca/13015_2025_271_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/5408cec9b1e8/13015_2025_271_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/80bcc15913ca/13015_2025_271_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/bd6e88029cb5/13015_2025_271_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/050444b2b32b/13015_2025_271_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/cda61149ee26/13015_2025_271_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3f70d82cd06c/13015_2025_271_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/d9dcdd953e0a/13015_2025_271_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3264ebf83097/13015_2025_271_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/dc39156f5b94/13015_2025_271_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/3e46a73eb7b7/13015_2025_271_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/0395d3c72425/13015_2025_271_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/807601b673ca/13015_2025_271_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/5ce70ed426b2/13015_2025_271_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/8979081de774/13015_2025_271_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/aef77f3c6f0d/13015_2025_271_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/fb19fc680b9f/13015_2025_271_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/d968fab061ca/13015_2025_271_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d138/12008955/5408cec9b1e8/13015_2025_271_Fig13_HTML.jpg

相似文献

1
: indexing maximal common subsequences for k strings.为k个字符串索引最大公共子序列
Algorithms Mol Biol. 2025 Apr 19;20(1):6. doi: 10.1186/s13015-025-00271-z.
2
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem.一种用于多最长公共子序列(MLCS)问题的新型高效图模型。
Front Genet. 2017 Aug 9;8:104. doi: 10.3389/fgene.2017.00104. eCollection 2017.
3
A path recorder algorithm for Multiple Longest Common Subsequences (MLCS) problems.一种用于多重最长公共子序列(MLCS)问题的路径记录算法。
Bioinformatics. 2020 May 1;36(10):3035-3042. doi: 10.1093/bioinformatics/btaa134.
4
[Extraction of symbolic determinants common to a family of biological sequences].
Biochimie. 1985 May;67(5):517-21. doi: 10.1016/s0300-9084(85)80271-6.
5
Algorithms for the Uniqueness of the Longest Common Subsequence.最长公共子序列唯一性算法。
J Bioinform Comput Biol. 2023 Dec;21(6):2350027. doi: 10.1142/S0219720023500270. Epub 2024 Jan 10.
6
An average-case efficient two-stage algorithm for enumerating all longest common substrings of minimum length between genome pairs.一种用于枚举基因组对之间所有最短长度最长公共子串的平均情况高效两阶段算法。
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:93-102. doi: 10.1109/ichi61247.2024.00020. Epub 2024 Aug 22.
7
Efficient Computation of Longest Common Subsequences with Multiple Substring Inclusive Constraints.
J Comput Biol. 2019 Sep;26(9):938-947. doi: 10.1089/cmb.2019.0008. Epub 2019 Apr 8.
8
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
9
Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.布隆过滤器前缀树:一种用于泛基因组存储的无比对和无参考的数据结构。
Algorithms Mol Biol. 2016 Apr 14;11:3. doi: 10.1186/s13015-016-0066-8. eCollection 2016.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

本文引用的文献

1
Nucleotide composition string selection in HIV-1 subtyping using whole genomes.使用全基因组进行HIV-1亚型分型中的核苷酸组成字符串选择
Bioinformatics. 2007 Jul 15;23(14):1744-52. doi: 10.1093/bioinformatics/btm248. Epub 2007 May 11.
2
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.