Koplenig Alexander, Meyer Peter, Wolfer Sascha, Müller-Spitzer Carolin
Institute for the German Language (IDS), Mannheim, Germany.
PLoS One. 2017 Mar 10;12(3):e0173614. doi: 10.1371/journal.pone.0173614. eCollection 2017.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that-despite differences in the way information is expressed-there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.
不同语言采用不同策略来传递结构和语法信息。例如,对于像汉语普通话或越南语这样的语言,句子中的语法依存关系主要通过词序来传达,而对于像因纽特语或克丘亚语这样的语言,词序的限制则少得多,因为这些语言还利用词的内部结构(如屈折形态学)来标记句子中的语法关系。基于对近1200种不同语言中《圣经》不同书卷的1500多个独特译本的定量分析,这些语言为全球约60亿人(超过世界人口的80%)的母语,我们提供了大规模证据,证明词序传达的信息量与词内部结构传达的信息量之间存在统计上的权衡:更依赖词序信息的语言往往较少依赖词结构信息,反之亦然。或者换句话说,如果词内携带的信息较少,那么为了成功交流,就必须在词之间传播更多信息。此外,我们发现,尽管信息表达的方式存在差异,但不同《圣经》书卷之间也存在权衡的证据,这种权衡在不同语言中几乎没有变化地反复出现:书卷的词序信息越丰富,其词结构信息就越少,反之亦然。我们认为,这可能表明,一方面,语言以非常不同(但高效)的方式编码信息。另一方面,与内容相关和文体特征在统计上以非常相似的方式编码。