Department of Cognitive Science, University of California, San Diego, USA.
Cognition. 2013 Sep;128(3):302-19. doi: 10.1016/j.cognition.2013.02.013. Epub 2013 Jun 6.
It is well known that real-time human language processing is highly incremental and context-driven, and that the strength of a comprehender's expectation for each word encountered is a key determinant of the difficulty of integrating that word into the preceding context. In reading, this differential difficulty is largely manifested in the amount of time taken to read each word. While numerous studies over the past thirty years have shown expectation-based effects on reading times driven by lexical, syntactic, semantic, pragmatic, and other information sources, there has been little progress in establishing the quantitative relationship between expectation (or prediction) and reading times. Here, by combining a state-of-the-art computational language model, two large behavioral data-sets, and non-parametric statistical techniques, we establish for the first time the quantitative form of this relationship, finding that it is logarithmic over six orders of magnitude in estimated predictability. This result is problematic for a number of established models of eye movement control in reading, but lends partial support to an optimal perceptual discrimination account of word recognition. We also present a novel model in which language processing is highly incremental well below the level of the individual word, and show that it predicts both the shape and time-course of this effect. At a more general level, this result provides challenges for both anticipatory processing and semantic integration accounts of lexical predictability effects. And finally, this result provides evidence that comprehenders are highly sensitive to relative differences in predictability - even for differences between highly unpredictable words - and thus helps bring theoretical unity to our understanding of the role of prediction at multiple levels of linguistic structure in real-time language comprehension.
众所周知,实时人类语言处理具有高度增量性和上下文驱动性,理解者对遇到的每个单词的预期强度是决定将该单词融入前文语境的难度的关键决定因素。在阅读中,这种不同的难度主要表现在阅读每个单词所花费的时间上。尽管在过去的三十年中,有许多研究表明,基于词汇、句法、语义、语用和其他信息源的预期会对阅读时间产生影响,但在建立预期(或预测)与阅读时间之间的定量关系方面几乎没有进展。在这里,我们通过结合最先进的计算语言模型、两个大型行为数据集和非参数统计技术,首次建立了这种关系的定量形式,发现它在估计的可预测性的六个数量级上呈对数关系。这一结果对阅读中眼球运动控制的许多现有模型提出了挑战,但为单词识别的最优感知辨别理论提供了部分支持。我们还提出了一个新的模型,其中语言处理在低于单个单词的水平上具有高度增量性,并表明它可以预测这种效应的形状和时程。在更一般的层面上,这一结果对词汇可预测性效应的预期处理和语义整合理论提出了挑战。最后,这一结果为理解者对可预测性的相对差异高度敏感提供了证据——即使是在高度不可预测的单词之间也是如此——从而有助于在实时语言理解的多个语言结构层次上为预测的作用提供理论统一性。