Department of Linguistics, University of Potsdam.
Cogn Sci. 2020 Dec;44(12):e12918. doi: 10.1111/cogs.12918.
Among theories of human language comprehension, cue-based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long-distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well-established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt's eye-tracking data. The features can easily be plugged into existing parsing models (including cue-based retrieval and self-organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.
在人类语言理解的理论中,基于线索的记忆检索已被证明是一个有用的框架,可以帮助理解在解决远距离依赖关系时何时以及如何出现处理困难。该领域的大多数先前工作都假设非常通用的检索线索(如[+主语]或[+单数])可以识别(有时会错误识别)检索目标,从而在单词之间建立依赖关系。然而,最近的工作表明,像这些通用的、精心挑选的检索线索可能不足以解释似真性错觉(Cunnings & Sturt,2018),这种错觉可能出现在像 The letter next to the porcelain plate shattered 这样的句子中。要捕捉这种检索干扰效应,需要词汇特定的特征和检索线索,但以有原则的方式精心挑选特征很难做到,并且会极大地增加建模者的自由度。为了解决这个问题,我们使用成熟的词嵌入方法来创建分布式词汇特征表示,使用分布式检索线索向量来编码与检索相关的信息。我们表明,特征向量和线索向量之间的相似性(可衡量似真性)可以预测 Cunnings 和 Sturt 的眼动追踪数据中的总阅读时间。这些特征可以轻松地插入到现有的解析模型中(包括基于线索的检索和自组织解析),使非常不同的模型处于更平等的地位,并促进未来的定量比较。