• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

近似循环字符串匹配的快速算法。

Fast algorithms for approximate circular string matching.

作者信息

Barton Carl, Iliopoulos Costas S, Pissis Solon P

机构信息

King's College London, London, UK.

出版信息

Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.

DOI:10.1186/1748-7188-9-9
PMID:24656145
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4234210/
Abstract

BACKGROUND

Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.

RESULTS

In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k, that is k=O(m/logm). We show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naïve approach.

CONCLUSIONS

We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.

摘要

背景

循环字符串匹配是一个在许多生物学情境中自然出现的问题。它在于在长度为n的文本中找到长度为m的模式的所有旋转出现情况。存在用于精确循环字符串匹配的最优平均情况算法。近似循环字符串匹配是一个相当未充分发展的领域。

结果

在本文中,我们提出一种用于精确循环字符串匹配的次优平均情况算法,其时间复杂度为O(n)。基于我们对精确情况的解决方案,我们提出两种用于在汉明距离模型下具有k个错配的近似循环字符串匹配的快速平均情况算法,对于适度的k值,即k = O(m / log m),时间复杂度为O(n)。我们展示了在编辑距离模型下如何轻松获得相同的结果。所提出的算法也被实现为库函数。实验结果表明,与朴素方法相比,该库中提供的函数将计算速度提高了三个多数量级。

结论

我们提出了两种用于具有k个错配的近似循环字符串匹配的快速平均情况算法;并表明它们在实践中也表现得非常好。我们贡献的重要性体现在所提供的函数可以无缝集成到任何生物学流程这一事实上。该库的源代码可在http://www.inf.kcl.ac.uk/research/projects/asmf/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/cf926e065fe4/1748-7188-9-9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/81bd1f10e58a/1748-7188-9-9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/cf926e065fe4/1748-7188-9-9-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/81bd1f10e58a/1748-7188-9-9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0284/4234210/cf926e065fe4/1748-7188-9-9-2.jpg

相似文献

1
Fast algorithms for approximate circular string matching.近似循环字符串匹配的快速算法。
Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.
2
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
3
A Simple, Fast, Filter-Based Algorithm for Approximate Circular Pattern Matching.
IEEE Trans Nanobioscience. 2016 Mar;15(2):93-100. doi: 10.1109/TNB.2016.2542062. Epub 2016 Mar 14.
4
Improved algorithms for approximate string matching (extended abstract).用于近似字符串匹配的改进算法(扩展摘要)。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-10-S1-S10.
5
A k-mismatch string matching for generalized edit distance using diagonal skipping method.基于对角线跳跃法的广义编辑距离的 k 不匹配字符串匹配。
PLoS One. 2021 May 4;16(5):e0251047. doi: 10.1371/journal.pone.0251047. eCollection 2021.
6
An algorithm for approximate tandem repeats.一种用于近似串联重复序列的算法。
J Comput Biol. 2001;8(1):1-18. doi: 10.1089/106652701300099038.
7
Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences.利用完美哈希提高 DNA 序列的哈希-Q 精确字符串匹配算法。
Comput Biol Med. 2021 Apr;131:104292. doi: 10.1016/j.compbiomed.2021.104292. Epub 2021 Feb 22.
8
The WM-q multiple exact string matching algorithm for DNA sequences.用于 DNA 序列的 WM-q 多重精确字符串匹配算法。
Comput Biol Med. 2021 Sep;136:104656. doi: 10.1016/j.compbiomed.2021.104656. Epub 2021 Jul 17.
9
Fast exact string pattern-matching algorithms adapted to the characteristics of the medical language.适应医学语言特点的快速精确字符串模式匹配算法。
J Am Med Inform Assoc. 2000 Jul-Aug;7(4):378-91. doi: 10.1136/jamia.2000.0070378.
10
Error Tree: A Tree Structure for Hamming and Edit Distances and Wildcards Matching.错误树:用于汉明距离、编辑距离和通配符匹配的树结构。
J Comput Biol. 2015 Dec;22(12):1118-28. doi: 10.1089/cmb.2015.0132. Epub 2015 Sep 24.

引用本文的文献

1
Prediction of Circular RNA Secondary Structures and Their Targets.环状RNA二级结构及其靶标的预测
Adv Exp Med Biol. 2025;1485:59-74. doi: 10.1007/978-981-96-9428-0_5.
2
Fast phylogenetic inference from typing data.
Algorithms Mol Biol. 2018 Feb 15;13:4. doi: 10.1186/s13015-017-0119-7. eCollection 2018.
3
MARS: improving multiple circular sequence alignment using refined sequences.MARS:使用优化序列改进多重环状序列比对
BMC Genomics. 2017 Jan 14;18(1):86. doi: 10.1186/s12864-016-3477-5.

本文引用的文献

1
CSA: an efficient algorithm to improve circular DNA multiple alignment.CSA:一种改进环状DNA多重比对的高效算法。
BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.
2
Lossless filter for multiple repeats with bounded edit distance.具有有界编辑距离的多重复无损滤波器。
Algorithms Mol Biol. 2009 Jan 30;4:3. doi: 10.1186/1748-7188-4-3.
3
The bacterial nucleoid: a highly organized and dynamic structure.细菌类核:一种高度有序且动态的结构。
4
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
5
Circular sequence comparison: algorithms and applications.循环序列比较:算法与应用
Algorithms Mol Biol. 2016 May 10;11:12. doi: 10.1186/s13015-016-0076-6. eCollection 2016.
6
SimpLiFiCPM: A Simple and Lightweight Filter-Based Algorithm for Circular Pattern Matching.
Int J Genomics. 2015;2015:259320. doi: 10.1155/2015/259320. Epub 2015 Oct 18.
7
Network analysis of circular permutations in multidomain proteins reveals functional linkages for uncharacterized proteins.多结构域蛋白质中环状排列的网络分析揭示了未表征蛋白质的功能联系。
Cancer Inform. 2015 Feb 19;13(Suppl 5):109-24. doi: 10.4137/CIN.S14059. eCollection 2014.
J Cell Biochem. 2005 Oct 15;96(3):506-21. doi: 10.1002/jcb.20519.
4
Archaeal genetics - the third way.古细菌遗传学——第三种方式。
Nat Rev Genet. 2005 Jan;6(1):58-73. doi: 10.1038/nrg1504.
5
THE CYCLIC HELIX AND CYCLIC COIL FORMS OF POLYOMA VIRAL DNA.多瘤病毒DNA的环状螺旋和环状盘绕形式
Proc Natl Acad Sci U S A. 1963 Oct;50(4):730-8. doi: 10.1073/pnas.50.4.730.
6
EVIDENCE FOR A RING STRUCTURE OF POLYOMA VIRUS DNA.多瘤病毒DNA环状结构的证据。
Proc Natl Acad Sci U S A. 1963 Aug;50(2):236-43. doi: 10.1073/pnas.50.2.236.