Seoane Luís F, Solé Ricard
Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain.
ICREA-Complex Systems Lab, Universitat Pompeu Fabra (GRIB), Dr Aiguader 80, 08003 Barcelona, Spain.
Entropy (Basel). 2020 Jan 31;22(2):165. doi: 10.3390/e22020165.
What are relevant levels of description when investigating human language? How are these levels connected to each other? Does one description yield smoothly into the next one such that different models lie naturally along a hierarchy containing each other? Or, instead, are there sharp transitions between one description and the next, such that to gain a little bit accuracy it is necessary to change our framework radically? Do different levels describe the same linguistic aspects with increasing (or decreasing) accuracy? Historically, answers to these questions were guided by intuition and resulted in subfields of study, from phonetics to syntax and semantics. Need for research at each level is acknowledged, but seldom are these different aspects brought together (with notable exceptions). Here, we propose a methodology to inspect empirical corpora systematically, and to extract from them, blindly, relevant phenomenological scales and interactions between them. Our methodology is rigorously grounded in information theory, multi-objective optimization, and statistical physics. Salient levels of linguistic description are readily interpretable in terms of energies, entropies, phase transitions, or criticality. Our results suggest a critical point in the description of human language, indicating that several complementary models are simultaneously necessary (and unavoidable) to describe it.
在研究人类语言时,相关的描述层次有哪些?这些层次是如何相互联系的?一种描述是否能平稳地过渡到下一种描述,以至于不同的模型自然地处于一个相互包含的层次结构中?或者,相反,一种描述与下一种描述之间是否存在明显的转变,以至于为了获得一点准确性就需要彻底改变我们的框架?不同层次是否以越来越高(或越来越低)的准确性描述相同的语言方面?从历史上看,对这些问题的回答是由直觉引导的,并产生了从语音学到句法和语义学的各个研究子领域。人们承认在每个层次上都需要进行研究,但这些不同的方面很少被整合在一起(有一些显著的例外)。在这里,我们提出一种方法,用于系统地检查经验语料库,并盲目地从中提取相关的现象学尺度以及它们之间的相互作用。我们的方法严格基于信息论、多目标优化和统计物理学。语言描述的显著层次可以很容易地根据能量、熵、相变或临界性来解释。我们的结果表明,在人类语言的描述中存在一个临界点,这表明需要几种互补的模型同时(且不可避免地)来描述它。