Klafka Josef, Yurovsky Daniel
Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA.
Entropy (Basel). 2021 Oct 2;23(10):1300. doi: 10.3390/e23101300.
Optimal coding theories of language predict that speakers will keep the amount of information in their utterances relatively uniform under the constraints imposed by their language, but how much do these constraints influence information structure, and how does this influence vary across languages? We present a novel method for characterizing the information structure of sentences across a diverse set of languages. While the structure of English is broadly consistent with the shape predicted by optimal coding, many languages are not consistent with this prediction. We proceed to show that the characteristic information curves of languages are partly related to a variety of typological features from phonology to word order. These results present an important step in the direction of exploring upper bounds for the extent to which linguistic codes can be optimal for communication.
语言的最优编码理论预测,在语言所施加的限制条件下,说话者会使他们话语中的信息量相对保持一致,但这些限制对信息结构有多大影响,以及这种影响在不同语言之间如何变化?我们提出了一种新颖的方法来刻画多种不同语言中句子的信息结构。虽然英语的结构大致与最优编码预测的形式一致,但许多语言并不符合这一预测。我们进而表明,语言的特征信息曲线部分与从音系学到词序的各种类型学特征相关。这些结果朝着探索语言编码在何种程度上能够实现最优交流的上限迈出了重要一步。