• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A Refutation of Finite-State Language Models through Zipf's Law for Factual Knowledge.通过齐普夫定律对事实性知识的有限状态语言模型的反驳。
Entropy (Basel). 2021 Sep 1;23(9):1148. doi: 10.3390/e23091148.
2
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited.自然语言是一种周边书写过程吗?关于事实与文字的定理再探讨。
Entropy (Basel). 2018 Jan 26;20(2):85. doi: 10.3390/e20020085.
3
Zipf's word frequency law in natural language: a critical review and future directions.自然语言中的齐普夫词频定律:批判性综述与未来方向
Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6.
4
Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.齐夫定律导致海普斯定律:分析有限系统中的它们之间的关系。
PLoS One. 2010 Dec 2;5(12):e14139. doi: 10.1371/journal.pone.0014139.
5
Random texts do not exhibit the real Zipf's law-like rank distribution.随机文本并不表现出真正的齐普夫定律式的等级分布。
PLoS One. 2010 Mar 9;5(3):e9411. doi: 10.1371/journal.pone.0009411.
6
Unzipping Zipf's law.解开齐普夫定律
PLoS One. 2017 Aug 9;12(8):e0181987. doi: 10.1371/journal.pone.0181987. eCollection 2017.
7
Zipf's law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort.再探齐夫定律:口语对话、语言单位、参数和省力原则。
Psychon Bull Rev. 2023 Feb;30(1):77-101. doi: 10.3758/s13423-022-02142-9. Epub 2022 Jul 15.
8
Large-Scale Analysis of Zipf's Law in English Texts.英文文本中齐普夫定律的大规模分析。
PLoS One. 2016 Jan 22;11(1):e0147073. doi: 10.1371/journal.pone.0147073. eCollection 2016.
9
Zipf's Law Arises Naturally When There Are Underlying, Unobserved Variables.当存在潜在的、未被观察到的变量时,齐普夫定律自然产生。
PLoS Comput Biol. 2016 Dec 20;12(12):e1005110. doi: 10.1371/journal.pcbi.1005110. eCollection 2016 Dec.
10
The language of gene ontology: a Zipf's law analysis.基因本体论的语言:Zipf 定律分析。
BMC Bioinformatics. 2012 Jun 7;13:127. doi: 10.1186/1471-2105-13-127.

引用本文的文献

1
Long-Range Dependence in Word Time Series: The Cosine Correlation of Embeddings.词时间序列中的长程相依性:嵌入的余弦相关性。
Entropy (Basel). 2025 Jun 9;27(6):613. doi: 10.3390/e27060613.

本文引用的文献

1
Divergent predictive states: The statistical complexity dimension of stationary, ergodic hidden Markov processes.离散预测状态:平稳遍历隐马尔可夫过程的统计复杂度维数。
Chaos. 2021 Aug;31(8):083114. doi: 10.1063/5.0050460.
2
Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal.将自然语言中的单词和词素顺序建模为记忆和惊讶之间的有效权衡。
Psychol Rev. 2021 Jul;128(4):726-756. doi: 10.1037/rev0000269. Epub 2021 Apr 1.
3
Approximating Information Measures for Fields.场的信息测度近似
Entropy (Basel). 2020 Jan 9;22(1):79. doi: 10.3390/e22010079.
4
Estimating Predictive Rate-Distortion Curves via Neural Variational Inference.通过神经变分推理估计预测率失真曲线。
Entropy (Basel). 2019 Jun 28;21(7):640. doi: 10.3390/e21070640.
5
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited.自然语言是一种周边书写过程吗?关于事实与文字的定理再探讨。
Entropy (Basel). 2018 Jan 26;20(2):85. doi: 10.3390/e20020085.
6
On the entropy of a hidden Markov process.关于隐马尔可夫过程的熵
Theor Comput Sci. 2008 May 1;395(2-3):203-219. doi: 10.1016/j.tcs.2008.01.012.
7
Proof of the Ergodic Theorem.遍历定理的证明。
Proc Natl Acad Sci U S A. 1931 Dec;17(12):656-60. doi: 10.1073/pnas.17.2.656.
8
Some effects of intermittent silence.间歇性沉默的一些影响。
Am J Psychol. 1957 Jun;70(2):311-4.
9
Regularities unseen, randomness observed: levels of entropy convergence.规律不可见,随机性可见:熵收敛水平。
Chaos. 2003 Mar;13(1):25-54. doi: 10.1063/1.1530990.
10
Inferring statistical complexity.推断统计复杂性。
Phys Rev Lett. 1989 Jul 10;63(2):105-108. doi: 10.1103/PhysRevLett.63.105.

通过齐普夫定律对事实性知识的有限状态语言模型的反驳。

A Refutation of Finite-State Language Models through Zipf's Law for Factual Knowledge.

作者信息

Dębowski Łukasz

机构信息

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland.

出版信息

Entropy (Basel). 2021 Sep 1;23(9):1148. doi: 10.3390/e23091148.

DOI:10.3390/e23091148
PMID:34573773
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8465033/
Abstract

We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we suppose that the semantic properties of texts in a natural language could be approximately captured by a recently introduced concept of a perigraphic process. Perigraphic processes are a class of stochastic processes that satisfy a Zipf-law accumulation of a subset of factual knowledge, which is time-independent, compressed, and effectively inferrable from the process. We show that the classes of finite-state processes and of perigraphic processes are disjoint, and we present a new simple example of perigraphic processes over a finite alphabet called Oracle processes. The disjointness result makes use of the Hilberg condition, i.e., the almost sure power-law growth of algorithmic mutual information. Using a strongly consistent estimator of the number of hidden states, we show that finite-state processes do not satisfy the Hilberg condition whereas Oracle processes satisfy the Hilberg condition via the data-processing inequality. We discuss the relevance of these mathematical results for theoretical and computational linguistics.

摘要

我们提出了一个基于语义而非句法的针对统计语言建模中有限状态过程的假设性论证。在这个理论模型中,我们假设自然语言文本的语义属性可以通过最近引入的周边图过程的概念近似地捕捉到。周边图过程是一类随机过程,它满足关于一部分事实性知识的齐普夫定律累积,这部分知识与时间无关、经过压缩且可从该过程有效推断出来。我们表明有限状态过程类和周边图过程类是不相交的,并且我们给出了一个在有限字母表上的周边图过程的新的简单示例,称为预言机过程。不相交性结果利用了希尔伯格条件,即算法互信息几乎必然的幂律增长。通过使用隐藏状态数量的强一致估计量,我们表明有限状态过程不满足希尔伯格条件,而预言机过程通过数据处理不等式满足希尔伯格条件。我们讨论了这些数学结果对于理论语言学和计算语言学的相关性。