Department of Computer Science, ETH Zurich, Zurich, Switzerland.
Methods of Plasticity Research, Department of Psychology, University of Zurich, Zurich, Switzerland.
Sci Data. 2018 Dec 11;5:180291. doi: 10.1038/sdata.2018.291.
We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4-6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement.
我们呈现了苏黎世认知语言处理语料库(Zurich Cognitive Language Processing Corpus,ZuCo),这是一个结合了脑电图(EEG)和眼动追踪记录的数据集,其中包含了阅读自然句子的受试者的数据。ZuCo 包括 12 位健康成年母语为英语的本地英语使用者的高密度 EEG 和眼动追踪数据,每位受试者阅读自然英语文本 4-6 小时。这些记录涵盖了两个正常阅读任务和一个特定任务的阅读任务,因此数据集包含了 21629 个单词、1107 个句子和 154173 个注视点的 EEG 和眼动追踪数据。我们相信,这个数据集代表了自然语言处理(NLP)的一个有价值的资源。EEG 和眼动追踪信号可用于训练各种任务的改进机器学习模型,特别是对于信息提取任务,如实体和关系提取以及情感分析。此外,这个数据集对于推进大脑活动和眼动水平的人类阅读和语言理解过程的研究也很有用。