Yang Jinbiao, van den Bosch Antal, Frank Stefan L
Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
Centre for Language Studies, Radboud University, Nijmegen, Netherlands.
Front Artif Intell. 2022 Feb 23;5:731615. doi: 10.3389/frai.2022.731615. eCollection 2022.
Words typically form the basis of psycholinguistic and computational linguistic studies about sentence processing. However, recent evidence shows the basic units during reading, i.e., the items in the mental lexicon, are not always words, but could also be sub-word and supra-word units. To recognize these units, human readers require a cognitive mechanism to learn and detect them. In this paper, we assume eye fixations during reading reveal the locations of the cognitive units, and that the cognitive units are analogous with the text units discovered by unsupervised segmentation models. We predict eye fixations by model-segmented units on both English and Dutch text. The results show the model-segmented units predict eye fixations better than word units. This finding suggests that the predictive performance of model-segmented units indicates their plausibility as cognitive units. The Less-is-Better (LiB) model, which finds the units that minimize both long-term and working memory load, offers advantages both in terms of prediction score and efficiency among alternative models. Our results also suggest that modeling the least-effort principle for the management of long-term and working memory can lead to inferring cognitive units. Overall, the study supports the theory that the mental lexicon stores not only words but also smaller and larger units, suggests that fixation locations during reading depend on these units, and shows that unsupervised segmentation models can discover these units.
单词通常构成了关于句子处理的心理语言学和计算语言学研究的基础。然而,最近的证据表明,阅读过程中的基本单位,即心理词典中的条目,并不总是单词,也可能是子词和超词单位。为了识别这些单位,人类读者需要一种认知机制来学习和检测它们。在本文中,我们假设阅读过程中的眼动注视揭示了认知单位的位置,并且认知单位与无监督分割模型发现的文本单位类似。我们用模型分割的单位对英语和荷兰语文本的眼动注视进行预测。结果表明,模型分割的单位比单词单位能更好地预测眼动注视。这一发现表明,模型分割单位的预测性能表明了它们作为认知单位的合理性。“少即是优”(LiB)模型,即找到能使长期记忆和工作记忆负荷最小化的单位,在预测得分和效率方面比其他模型都具有优势。我们的结果还表明,对管理长期记忆和工作记忆的最少努力原则进行建模可以推断出认知单位。总体而言,该研究支持了心理词典不仅存储单词,还存储更小和更大单位的理论,表明阅读过程中的注视位置取决于这些单位,并表明无监督分割模型可以发现这些单位。