• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

正则表达式的近似匹配。

Approximate matching of regular expressions.

作者信息

Myers E W, Miller W

出版信息

Bull Math Biol. 1989;51(1):5-37. doi: 10.1007/BF02458834.

DOI:10.1007/BF02458834
PMID:2706401
Abstract

Given a sequence A and regular expression R, the approximate regular expression matching problem is to find a sequence matching R whose optimal alignment with A is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in time O(MN), where M and N are the lengths of A and R. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires only O(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for sub-strings of A that strongly align with a sequence in R, as required for typical data base searches. We also show how to deliver an optimal alignment between A and R in only O(N + log M) space using O(MN log M) time. Finally, an O(MN(M + N) + N2log N) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.

摘要

给定一个序列A和正则表达式R,近似正则表达式匹配问题是要找到一个与R匹配的序列,该序列与A的最优比对是所有此类序列中得分最高的。本文开发了一种算法,能在O(MN)时间内解决该问题,其中M和N分别是A和R的长度。因此,时间要求在渐近意义上不比比对两个固定序列的简单问题更差。我们的方法在几个方面优于Wagner和Seiferas早期的算法。首先,除了整数值代价外,它还能处理实数值代价,且不损失渐近效率。其次,它只需要O(N)的空间就能给出最佳比对的得分。最后,其结构允许采用一些实现技术,使其在实际应用中非常快速。我们扩展该方法以适应分子生物学典型应用所需的空位罚分,并进一步改进它以搜索A中与R中的序列强烈比对的子串,这是典型数据库搜索所需要的。我们还展示了如何使用O(MN log M)时间,仅在O(N + log M)空间内给出A和R之间的最优比对。最后,针对空位代价是其长度的任意递增函数的比对计分方案,给出了一个O(MN(M + N) + N2log N)时间的算法。

相似文献

1
Approximate matching of regular expressions.正则表达式的近似匹配。
Bull Math Biol. 1989;51(1):5-37. doi: 10.1007/BF02458834.
2
A space-efficient algorithm for the constrained pairwise sequence alignment problem.一种用于受限成对序列比对问题的节省空间的算法。
Genome Inform. 2005;16(2):237-46.
3
An efficient string matching algorithm with k differences for nucleotide and amino acid sequences.一种针对核苷酸和氨基酸序列的具有k个差异的高效字符串匹配算法。
Nucleic Acids Res. 1986 Jan 10;14(1):31-46. doi: 10.1093/nar/14.1.31.
4
Local sequence alignments with monotonic gap penalties.具有单调空位罚分的局部序列比对。
Bioinformatics. 1999 Jun;15(6):455-62. doi: 10.1093/bioinformatics/15.6.455.
5
Hierarchical method to align large numbers of biological sequences.用于比对大量生物序列的分层方法。
Methods Enzymol. 1990;183:456-74. doi: 10.1016/0076-6879(90)83031-4.
6
An approximate matching algorithm for finding (sub-)optimal sequences in S-attributed grammars.
Bioinformatics. 2002;18 Suppl 2:S250-9. doi: 10.1093/bioinformatics/18.suppl_2.s250.
7
Approximate matching of network expressions with spacers.
J Comput Biol. 1996 Spring;3(1):33-51. doi: 10.1089/cmb.1996.3.33.
8
Optimal alignment between groups of sequences and its application to multiple sequence alignment.序列组之间的最优比对及其在多序列比对中的应用。
Comput Appl Biosci. 1993 Jun;9(3):361-70. doi: 10.1093/bioinformatics/9.3.361.
9
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
10
Recent developments in linear-space alignment methods: a survey.线性空间对齐方法的最新进展:一项综述。
J Comput Biol. 1994 Winter;1(4):271-91. doi: 10.1089/cmb.1994.1.271.

引用本文的文献

1
Approximating edit distances between complex tandem repeats efficiently.高效近似复杂串联重复序列之间的编辑距离。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf155.
2
Genotyping sequence-resolved copy number variation using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes.使用泛基因组对序列解析的拷贝数变异进行基因分型,揭示了重复基因的旁系同源物特异性全球多样性和表达差异。
bioRxiv. 2025 Apr 13:2024.08.11.607269. doi: 10.1101/2024.08.11.607269.
3
A landscape of complex tandem repeats within individual human genomes.

本文引用的文献

1
Optimal sequence alignments.最佳序列比对。
Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382-6. doi: 10.1073/pnas.80.5.1382.
2
An improved algorithm for matching biological sequences.一种用于匹配生物序列的改进算法。
J Mol Biol. 1982 Dec 15;162(3):705-8. doi: 10.1016/0022-2836(82)90398-9.
3
Rapid searches for complex patterns in biological molecules.快速搜索生物分子中的复杂模式。
个体人类基因组内复杂串联重复的景观。
Nat Commun. 2023 Sep 14;14(1):5530. doi: 10.1038/s41467-023-41262-1.
4
Decomposing mosaic tandem repeats accurately from long reads.从长读中准确分解镶嵌串联重复序列。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad185.
5
Detection of Antibodies Against the SARS-CoV-2 Spike Protein and Analysis of the Peripheral Blood Mononuclear Cell Transcriptomic Profile, 15 Years After Recovery From SARS.SARS 康复 15 年后对 SARS-CoV-2 刺突蛋白的抗体检测及外周血单个核细胞转录组谱分析。
Front Cell Infect Microbiol. 2021 Nov 18;11:768993. doi: 10.3389/fcimb.2021.768993. eCollection 2021.
6
Expression Analysis of Circular RNAs in Young and Sexually Mature Boar Testes.青年与性成熟公猪睾丸中环状RNA的表达分析
Animals (Basel). 2021 May 17;11(5):1430. doi: 10.3390/ani11051430.
7
Pangenome Graphs.泛基因组图谱。
Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162. doi: 10.1146/annurev-genom-120219-080406. Epub 2020 May 26.
8
Fully-sensitive seed finding in sequence graphs using a hybrid index.使用混合索引在序列图中进行完全敏感的种子发现。
Bioinformatics. 2019 Jul 15;35(14):i81-i89. doi: 10.1093/bioinformatics/btz341.
9
Bit-parallel sequence-to-graph alignment.位并行序列到图的对齐。
Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.
10
Profiling the miRNA-mRNA-lncRNA interaction network in MSC osteoblast differentiation induced by (+)-cholesten-3-one.分析 (+)-胆甾-3-酮诱导间充质干细胞成骨分化过程中的 miRNA-mRNA-lncRNA 相互作用网络。
BMC Genomics. 2018 Oct 29;19(1):783. doi: 10.1186/s12864-018-5155-2.
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80. doi: 10.1093/nar/12.1part1.263.
4
Turn prediction in proteins using a pattern-matching approach.使用模式匹配方法预测蛋白质中的转角。
Biochemistry. 1986 Jan 14;25(1):266-75. doi: 10.1021/bi00349a037.
5
Optimal alignments in linear space.线性空间中的最优比对
Comput Appl Biosci. 1988 Mar;4(1):11-7. doi: 10.1093/bioinformatics/4.1.11.
6
Sequence comparison with concave weighting functions.使用凹加权函数进行序列比较。
Bull Math Biol. 1988;50(2):97-120. doi: 10.1007/BF02459948.