Suppr超能文献

当存在潜在的、未被观察到的变量时,齐普夫定律自然产生。

Zipf's Law Arises Naturally When There Are Underlying, Unobserved Variables.

作者信息

Aitchison Laurence, Corradi Nicola, Latham Peter E

机构信息

Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom.

Weill Medical College, Cornell University, New York, New York, United States of America.

出版信息

PLoS Comput Biol. 2016 Dec 20;12(12):e1005110. doi: 10.1371/journal.pcbi.1005110. eCollection 2016 Dec.

Abstract

Zipf's law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf's law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf's law. This explanation rests on the observation that real world data is often generated from underlying causes, known as latent variables. Those latent variables mix together multiple models that do not obey Zipf's law, giving a model that does. Here we extend that work both theoretically and empirically. Theoretically, we provide a far simpler and more intuitive explanation of Zipf's law, which at the same time considerably extends the class of models to which this explanation can apply. Furthermore, we also give methods for verifying whether this explanation applies to a particular dataset. Empirically, these advances allowed us extend this explanation to important classes of data, including word frequencies (the first domain in which Zipf's law was discovered), data with variable sequence length, and multi-neuron spiking activity.

摘要

齐普夫定律指出,一个观测值出现的概率与其排名成反比,这一规律在许多领域都有被观察到。虽然在每个领域都有解释齐普夫定律的模型,但这些解释通常是特定领域的。最近,统计物理学的方法被用来表明,相当广泛的一类模型确实能对齐普夫定律提供一般性解释。这种解释基于这样的观察:现实世界的数据通常是由潜在变量(即潜在原因)生成的。这些潜在变量将多个不服从齐普夫定律的模型混合在一起,从而产生一个服从该定律的模型。在此,我们从理论和实证两方面扩展了这项工作。在理论上,我们对齐普夫定律给出了一个简单得多且更直观的解释,同时极大地扩展了这一解释所能适用的模型类别。此外,我们还给出了验证这一解释是否适用于特定数据集的方法。在实证方面,这些进展使我们能够将这一解释扩展到重要的数据类别,包括词频(齐普夫定律最初被发现的领域)、具有可变序列长度的数据以及多神经元脉冲活动。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac76/5172588/b9d80cdb1dbd/pcbi.1005110.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验