Center for Language and Brain, HSE University, Moscow, Russia.
Vinogradov Institute of the Russian Language, Moscow, Russia.
PLoS One. 2021 Jan 28;16(1):e0246133. doi: 10.1371/journal.pone.0246133. eCollection 2021.
During reading or listening, people can generate predictions about the lexical and morphosyntactic properties of upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on a human-based approach and estimate predictability via the cloze task. Our study investigated an alternative corpus-based approach for estimating predictability via language predictability models. We obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Importantly, we estimated how much variance in eye movements registered while reading the same sentences was explained by each of the two probabilities and whether the two probabilities explain the same variance. Along with lexical predictability (the activation of a particular word form), we analyzed morphosyntactic predictability (the activation of morphological features of words) and its effect on reading times over and above lexical predictability. We found that for predicting reading times, cloze and corpus-based measures of both lexical and morphosyntactic predictability explained the same amount of variance. However, cloze and corpus-based lexical probabilities both independently contributed to a better model fit, whereas for morphosyntactic probabilities, the contributions of cloze and corpus-based measures were interchangeable. Therefore, morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments. Our results also indicate that in languages with rich inflectional morphology, such as Russian, when people engage in prediction, they are much more successful in predicting isolated morphosyntactic features than predicting the particular lexeme and its full morphosyntactic markup.
在阅读或听力过程中,人们可以根据可用的上下文,生成关于即将到来的输入的词汇和形态句法属性的预测。研究可预测性或对其进行控制的心理语言学实验传统上依赖于基于人类的方法,并通过 cloze 任务来估计可预测性。我们的研究调查了一种通过语言可预测性模型来估计可预测性的替代基于语料库的方法。我们为 144 个俄语句子中的所有单词获得了 cloze 和基于语料库的概率,并对这两个度量进行了相关分析,发现它们之间存在很强的相关性。重要的是,我们估计在阅读相同句子时记录的眼球运动中的多少方差可以由这两个概率中的每一个来解释,以及这两个概率是否可以解释相同的方差。除了词汇可预测性(特定单词形式的激活),我们还分析了形态句法可预测性(单词的形态特征的激活)及其对阅读时间的影响,超出了词汇可预测性。我们发现,对于预测阅读时间,词汇和形态句法可预测性的 cloze 和基于语料库的度量都解释了相同的方差。然而,cloze 和基于语料库的词汇概率都独立地对更好的模型拟合做出了贡献,而对于形态句法概率,cloze 和基于语料库的度量的贡献是可以互换的。因此,形态句法但不是词汇基于语料库的概率可以在阅读实验中替代 cloze 概率。我们的研究结果还表明,在具有丰富屈折形态的语言中,例如俄语,当人们进行预测时,他们在预测孤立的形态句法特征方面比预测特定的词素及其完整的形态句法标记要成功得多。