Suppr超能文献

超越词频:词的时间分布中的爆发、沉寂和标度。

Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words.

机构信息

Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.

出版信息

PLoS One. 2009 Nov 11;4(11):e7678. doi: 10.1371/journal.pone.0007678.

Abstract

BACKGROUND

Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well.

METHODOLOGY/PRINCIPAL FINDINGS: By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage.

CONCLUSIONS/SIGNIFICANCE: Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf's law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.

摘要

背景

齐夫(Zipf)发现词汇频率分布遵循幂律,这一发现将生物和物理过程以及语言联系起来,为人类交流的复杂系统视角奠定了基础。最近的研究还发现了事件连续发生背后的动态具有标度规律,这表明语言也可能存在类似的发现。

方法/主要发现:通过考虑 USENET 讨论组中的常用词以及语言具有不同正式程度的不同数据库,我们发现相同单词连续出现之间的距离分布呈现出突发的泊松过程偏离,并且可以很好地用扩展指数(Weibull)标度来描述。这种偏离的程度强烈依赖于语义类型——每个单词的逻辑性的度量,而与频率的相关性较弱。我们开发了一种生成模型来描述这种行为,该模型完全确定了单词使用的动态。

结论/意义:单词的重复模式可以很好地用重复时间的扩展指数分布来描述,这是一种经验标度,不能从齐夫定律中预测。由于单词的使用为人类思想和活动提供了一个独特而精确的有力视角,因此我们的发现也对其他集体人类动态的明显表现形式具有启示意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfd/2770836/9ae9b378537d/pone.0007678.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验