• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然语言是一种周边书写过程吗?关于事实与文字的定理再探讨。

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited.

作者信息

Dębowski Łukasz

机构信息

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland.

出版信息

Entropy (Basel). 2018 Jan 26;20(2):85. doi: 10.3390/e20020085.

DOI:10.3390/e20020085
PMID:33265176
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7512648/
Abstract

As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the Prediction by Partial Matching (PPM) compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence, we suppose that natural language considered as a process is not only non-Markov but also perigraphic.

摘要

如我们所讨论的,当从一个平稳随机过程中抽取的无限随机文本中能够检测到一个随机持久主题时,该平稳随机过程是非遍历的;而当需要一个由独立随机比特组成的无限序列(称为概率事实)来完整描述这个主题时,我们称该过程为强非遍历的。用一个由算法生成的随机比特序列(称为算法事实)来取代概率事实,我们将这个特性应用回遍历过程。随后,如果从该过程中抽取的有限文本中能够推断出的算法事实的数量随文本长度的幂次增长,我们就称这个过程为周边图式的。我们给出了这样一个过程的简单示例。此外,我们证明了一个我们称为关于事实和单词的定理的断言。这个命题表明,从一个过程中抽取的文本中能够推断出的概率或算法事实的数量,必须大致小于通过部分匹配预测(PPM)压缩算法在该文本中检测到的不同单词状字符串的数量。我们还观察到,莎士比亚戏剧样本中的单词状字符串数量遵循经验性的阶梯幂律,这与马尔可夫过程形成鲜明对比。因此,我们推测,作为一个过程来考虑的自然语言不仅是非马尔可夫的,而且是周边图式的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a4f/7512648/11bdc89825e4/entropy-20-00085-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a4f/7512648/11bdc89825e4/entropy-20-00085-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a4f/7512648/11bdc89825e4/entropy-20-00085-g001.jpg

相似文献

1
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited.自然语言是一种周边书写过程吗?关于事实与文字的定理再探讨。
Entropy (Basel). 2018 Jan 26;20(2):85. doi: 10.3390/e20020085.
2
A Refutation of Finite-State Language Models through Zipf's Law for Factual Knowledge.通过齐普夫定律对事实性知识的有限状态语言模型的反驳。
Entropy (Basel). 2021 Sep 1;23(9):1148. doi: 10.3390/e23091148.
3
Excess entropy in natural language: Present state and perspectives.自然语言中的过剩熵:现状与展望。
Chaos. 2011 Sep;21(3):037105. doi: 10.1063/1.3630929.
4
Word organization in coding DNA: a mathematical model.编码DNA中的单词组织:一种数学模型。
Theory Biosci. 2006 Aug;125(1):1-17. doi: 10.1016/j.thbio.2006.03.002. Epub 2006 Apr 27.
5
Quantifying nonergodicity in nonautonomous dissipative dynamical systems: An application to climate change.量化非自治耗散动力系统中的非遍历性:气候变化的一个应用。
Phys Rev E. 2016 Aug;94(2-1):022214. doi: 10.1103/PhysRevE.94.022214. Epub 2016 Aug 24.
6
Language as an evolving word web.语言是一个不断演变的词汇网络。
Proc Biol Sci. 2001 Dec 22;268(1485):2603-6. doi: 10.1098/rspb.2001.1824.
7
Random texts do not exhibit the real Zipf's law-like rank distribution.随机文本并不表现出真正的齐普夫定律式的等级分布。
PLoS One. 2010 Mar 9;5(3):e9411. doi: 10.1371/journal.pone.0009411.
8
Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation.通过句子形成中的样本空间坍缩理解齐普夫词频定律。
J R Soc Interface. 2015 Jul 6;12(108):20150330. doi: 10.1098/rsif.2015.0330.
9
From computing with numbers to computing with words. From manipulation of measurements to manipulation of perceptions.从数字计算到词语计算。从测量操作到感知操作。
Ann N Y Acad Sci. 2001 Apr;929:221-52.
10
Latent IBP Compound Dirichlet Allocation.潜在 IBP 复合狄利克雷分配。
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):321-33. doi: 10.1109/TPAMI.2014.2313122.

引用本文的文献

1
Information Theory Opens New Dimensions in Experimental Studies of Animal Behaviour and Communication.信息论为动物行为与交流的实验研究开辟了新维度。
Animals (Basel). 2023 Mar 26;13(7):1174. doi: 10.3390/ani13071174.
2
An Information Theoretic Approach to Symbolic Learning in Synthetic Languages.一种用于合成语言中符号学习的信息论方法。
Entropy (Basel). 2022 Feb 10;24(2):259. doi: 10.3390/e24020259.
3
A Challenge for Contrastive L1/L2 Corpus Studies: Large Inter- and Intra-Individual Variation Across Morphological, but Not Global Syntactic Categories in Task-Based Corpus Data of a Homogeneous L1 German Group.

本文引用的文献

1
Excess entropy in natural language: Present state and perspectives.自然语言中的过剩熵:现状与展望。
Chaos. 2011 Sep;21(3):037105. doi: 10.1063/1.3630929.
2
Regularities unseen, randomness observed: levels of entropy convergence.规律不可见,随机性可见:熵收敛水平。
Chaos. 2003 Mar;13(1):25-54. doi: 10.1063/1.1530990.
3
Language acquisition and the discovery of phrase structure.语言习得与短语结构的发现
对比性第一语言/第二语言语料库研究面临的一项挑战:在以德语为母语的同质群体的基于任务的语料库数据中,形态学类别(而非全局句法类别)存在较大的个体间和个体内差异。
Front Psychol. 2021 Nov 25;12:716485. doi: 10.3389/fpsyg.2021.716485. eCollection 2021.
4
A Refutation of Finite-State Language Models through Zipf's Law for Factual Knowledge.通过齐普夫定律对事实性知识的有限状态语言模型的反驳。
Entropy (Basel). 2021 Sep 1;23(9):1148. doi: 10.3390/e23091148.
5
Approximating Information Measures for Fields.场的信息测度近似
Entropy (Basel). 2020 Jan 9;22(1):79. doi: 10.3390/e22010079.
6
Estimating Predictive Rate-Distortion Curves via Neural Variational Inference.通过神经变分推理估计预测率失真曲线。
Entropy (Basel). 2019 Jun 28;21(7):640. doi: 10.3390/e21070640.
7
Power Law Behaviour in Complex Systems.复杂系统中的幂律行为。
Entropy (Basel). 2018 Sep 5;20(9):671. doi: 10.3390/e20090671.
Lang Speech. 1980 Jul-Sep;23(Pt 3):255-69.