Ashraf Md Izhar, Sinha Sitabhra
The Institute of Mathematical Sciences, Chennai, Tamil Nadu, India.
B. S. Abdur Rahman University, Chennai, Tamil Nadu, India.
PLoS One. 2018 Jan 17;13(1):e0190735. doi: 10.1371/journal.pone.0190735. eCollection 2018.
Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.
语言能够通过符号序列传达复杂的思想,是我们人类物种的一个特征,并以多种形式表现出来。通过使用针对多种不同语言和文字的大型书面语料库,我们发现单词左右两端符号的出现概率分布具有明显的异质性。使用定量不平等度量(即信息熵和基尼指数)来表征这种不对称性,我们发现单词开头在符号使用上的限制比结尾少。这种特性并非仅仅归因于常用词缀的使用,因为即使只考虑词根时也能看到这一现象。我们利用这种不对称性的存在来推断未破译铭文的书写方向,这与考古证据相符。与专注于特定语言模式的音位结构限制的传统研究不同,我们的研究揭示了一种适用于所有语言和书写系统的特性。由于语言和书写都是我们人类物种的独特方面,这种普遍特征可能反映了人类认知现象的一种固有特征。