School of Computing, Dublin City University, Dublin, Ireland.
ML-Labs, Dublin City University, Dublin, Ireland.
Sci Data. 2024 Oct 9;11(1):1104. doi: 10.1038/s41597-024-03915-8.
This paper introduces the DERCo (Dublin EEG-based Reading Experiment Corpus), a language resource combining electroencephalography (EEG) and next-word prediction data obtained from participants reading narrative texts. The dataset comprises behavioral data collected from 500 participants recruited through the Amazon Mechanical Turk online crowd-sourcing platform, along with EEG recordings from 22 healthy adult native English speakers. The online experiment was designed to examine the context-based word prediction by a large sample of participants, while the EEG-based experiment was developed to extend the validation of behavioral next-word predictability. Online participants were instructed to predict upcoming words and complete entire stories. Cloze probabilities were then calculated for each word so that this predictability measure could be used to support various analyses pertaining to semantic context effects in the EEG recordings. EEG-based analyses revealed significant differences between high and low predictable words, demonstrating one important type of potential analysis that necessitates close integration of these two datasets. This material is a valuable resource for researchers in neurolinguistics due to the word-level EEG recordings in context.
本文介绍了 DERCo(都柏林基于脑电图的阅读实验语料库),这是一个结合了脑电图(EEG)和参与者阅读叙事文本时的下一个单词预测数据的语言资源。该数据集包括通过亚马逊 Mechanical Turk 在线众包平台招募的 500 名参与者的行为数据,以及 22 名健康成年母语为英语的人的 EEG 记录。在线实验旨在通过大量参与者检验基于上下文的单词预测,而基于 EEG 的实验则旨在扩展行为下一个单词可预测性的验证。在线参与者被要求预测即将到来的单词并完成整个故事。然后为每个单词计算 cloze 概率,以便可以使用这种可预测性度量来支持 EEG 记录中与语义上下文效应相关的各种分析。基于 EEG 的分析显示了高可预测性和低可预测性单词之间的显著差异,证明了需要密切整合这两个数据集的一种重要潜在分析类型。由于该语料库具有上下文级别的单词级 EEG 记录,因此对于神经语言学研究人员来说是一种有价值的资源。