• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

第\(th\)子词复杂度的渐近分析。

Asymptotic Analysis of the th Subword Complexity.

作者信息

Ahmadi Lida, Ward Mark Daniel

机构信息

Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA.

Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.

出版信息

Entropy (Basel). 2020 Feb 12;22(2):207. doi: 10.3390/e22020207.

DOI:10.3390/e22020207
PMID:33285983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516637/
Abstract

Patterns within strings enable us to extract vital information regarding a string's randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the th Subword Complexity of the character string. By definition, the th Subword Complexity is the number of distinct substrings of length that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the th Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns' auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k = Θ ( log n ) . In the proof, we compare the distribution of the th Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.

摘要

字符串中的模式使我们能够提取有关字符串随机性的重要信息。通过一个称为字符串的第(k)子词复杂度的值来描述一个字符串是随机的(模式中几乎没有重复)还是周期性的(模式中有重复)。根据定义,第(k)子词复杂度是给定字符串中出现的长度为(k)的不同子串的数量。在本文中,我们评估了无记忆源上二进制字符串的第(k)子词复杂度的期望值和二阶阶乘矩(随后是关于二阶矩的一个推论)。我们首先采用组合方法来推导有限长度字符串中模式出现次数的概率生成函数。这使我们能够根据模式的自相关和相关多项式得到这两个矩的精确表达式。然后我们研究(k = Θ ( \log n ))时的渐近行为。在证明过程中,我们将二进制字符串的第(k)子词复杂度的分布与存储在字典树中的独立字符串的不同前缀的分布进行比较。我们使用的方法涉及复分析、解析泊松化和去泊松化、梅林变换以及鞍点分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/2dfb377ee2c9/entropy-22-00207-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/d4071e328dbe/entropy-22-00207-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/1007b73994a5/entropy-22-00207-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/2dfb377ee2c9/entropy-22-00207-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/d4071e328dbe/entropy-22-00207-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/1007b73994a5/entropy-22-00207-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a78/7516637/2dfb377ee2c9/entropy-22-00207-g003.jpg

相似文献

1
Asymptotic Analysis of the th Subword Complexity.第\(th\)子词复杂度的渐近分析。
Entropy (Basel). 2020 Feb 12;22(2):207. doi: 10.3390/e22020207.
2
Decision Trees for Binary Subword-Closed Languages.二元子词封闭语言的决策树
Entropy (Basel). 2023 Feb 14;25(2):349. doi: 10.3390/e25020349.
3
Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.
4
Helmholtz vibrations in bowed strings.弓弦乐器中的亥姆霍兹振动。
J Acoust Soc Am. 2022 Apr;151(4):2461. doi: 10.1121/10.0010159.
5
Extension of the method of moments for population balances involving fractional moments and application to a typical agglomeration problem.
J Colloid Interface Sci. 2004 Aug 1;276(1):106-12. doi: 10.1016/j.jcis.2004.03.052.
6
Azure-winged magpies solve string-pulling tasks by partial understanding of the physical cognition.蓝翅喜鹊通过对物理认知的部分理解来解决拉绳任务。
Curr Zool. 2019 Aug;65(4):385-392. doi: 10.1093/cz/zoy070. Epub 2018 Sep 11.
7
Direct calculation of the moments of the distribution of photon time of flight in tissue with a finite-element method.用有限元方法直接计算组织中光子飞行时间分布的矩。
Appl Opt. 1995 May 20;34(15):2683-7. doi: 10.1364/AO.34.002683.
8
Algorithmic complexity for short binary strings applied to psychology: a primer.短二进制字符串在心理学中的算法复杂度:简介。
Behav Res Methods. 2014 Sep;46(3):732-44. doi: 10.3758/s13428-013-0416-0.
9
An Informational Test for Random Finite Strings.随机有限字符串的信息性测试
Entropy (Basel). 2018 Dec 6;20(12):934. doi: 10.3390/e20120934.
10
Algorithmic complexity for psychology: a user-friendly implementation of the coding theorem method.心理学的算法复杂性:编码定理方法的用户友好型实现
Behav Res Methods. 2016 Mar;48(1):314-29. doi: 10.3758/s13428-015-0574-3.

引用本文的文献

1
Information Theory and Language.信息论与语言
Entropy (Basel). 2020 Apr 11;22(4):435. doi: 10.3390/e22040435.

本文引用的文献

1
Genomic DNA k-mer spectra: models and modalities.基因组 DNA k--mer 频谱:模型与模态。
Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. Epub 2009 Oct 8.
2
De novo identification of repeat families in large genomes.大型基因组中重复家族的从头鉴定。
Bioinformatics. 2005 Jun;21 Suppl 1:i351-8. doi: 10.1093/bioinformatics/bti1018.
3
Frequent oligonucleotides and peptides of the Haemophilus influenzae genome.流感嗜血杆菌基因组的常见寡核苷酸和肽段。
Nucleic Acids Res. 1996 Nov 1;24(21):4263-72. doi: 10.1093/nar/24.21.4263.
4
Linguistics of nucleotide sequences. II: Stationary words in genetic texts and the zonal structure of DNA.
J Biomol Struct Dyn. 1989 Apr;6(5):1027-38. doi: 10.1080/07391102.1989.10506529.
5
Over- and under-representation of short oligonucleotides in DNA sequences.DNA序列中短寡核苷酸的过度和不足表现
Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358-62. doi: 10.1073/pnas.89.4.1358.
6
Base compositional structure of genomes.
Genomics. 1992 Aug;13(4):1056-64. doi: 10.1016/0888-7543(92)90019-o.
7
Statistical analyses of counts and distributions of restriction sites in DNA sequences.DNA序列中限制性酶切位点数量及分布的统计分析。
Nucleic Acids Res. 1992 Mar 25;20(6):1363-70. doi: 10.1093/nar/20.6.1363.