Suppr超能文献

跨语系的普遍词汇排序熵。

Universal entropy of word ordering across linguistic families.

机构信息

The University of Manchester, Manchester, United Kingdom.

出版信息

PLoS One. 2011;6(5):e19875. doi: 10.1371/journal.pone.0019875. Epub 2011 May 13.

Abstract

BACKGROUND

The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language.

METHODOLOGY/PRINCIPAL FINDINGS: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.

CONCLUSIONS/SIGNIFICANCE: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

摘要

背景

语言能力可能是我们物种最独特的特征,使我们拥有独特的能力来交流高度结构化的信息。在书面语言中,信息是通过语法和语义约束下的基本符号的串联来编码的。与其他自然信息载体一样,由此产生的符号序列在有序和无序之间表现出微妙的平衡。这种平衡是由符号的多样性及其在序列中的特定顺序相互作用决定的。在这里,我们使用熵来量化不同组织层次对语言整体统计结构的贡献。

方法/主要发现:我们计算了相对熵度量来量化来自不同语言家族的语言中单词序列的有序程度。虽然对不同语言家族的整体语言熵进行直接估计会产生不同的值,但量化单词顺序的相对熵对于所有这些家族来说几乎都是一个恒定的值。

结论/意义:我们的研究结果表明,尽管所分析的语言在结构和词汇上存在差异,但单词顺序对语言结构的影响是一种统计语言学的普遍性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b76/3094390/08cd6f38caf2/pone.0019875.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验