Hollenstein Nora, Renggli Cedric, Glaus Benjamin, Barrett Maria, Troendle Marius, Langer Nicolas, Zhang Ce
Department of Nordic Studies and Linguistics, University of Copenhagen, Copenhagen, Denmark.
Department of Computer Science, Swiss Federal Institute of Technology, ETH Zurich, Zurich, Switzerland.
Front Hum Neurosci. 2021 Jul 13;15:659410. doi: 10.3389/fnhum.2021.659410. eCollection 2021.
Until recently, human behavioral data from reading has mainly been of interest to researchers to understand human cognition. However, these human language processing signals can also be beneficial in machine learning-based natural language processing tasks. Using EEG brain activity for this purpose is largely unexplored as of yet. In this paper, we present the first large-scale study of systematically analyzing the potential of EEG brain activity data for improving natural language processing tasks, with a special focus on which features of the signal are most beneficial. We present a multi-modal machine learning architecture that learns jointly from textual input as well as from EEG features. We find that filtering the EEG signals into frequency bands is more beneficial than using the broadband signal. Moreover, for a range of word embedding types, EEG data improves binary and ternary sentiment classification and outperforms multiple baselines. For more complex tasks such as relation detection, only the contextualized BERT embeddings outperform the baselines in our experiments, which raises the need for further research. Finally, EEG data shows to be particularly promising when limited training data is available.
直到最近,阅读中的人类行为数据主要引起研究人员的兴趣,用于理解人类认知。然而,这些人类语言处理信号在基于机器学习的自然语言处理任务中也可能有益。目前,在这方面使用脑电图(EEG)脑活动数据在很大程度上尚未得到探索。在本文中,我们展示了第一项大规模研究,系统地分析了EEG脑活动数据在改善自然语言处理任务方面的潜力,特别关注信号的哪些特征最有益。我们提出了一种多模态机器学习架构,它可以从文本输入以及EEG特征中联合学习。我们发现,将EEG信号过滤到不同频段比使用宽带信号更有益。此外,对于一系列词嵌入类型,EEG数据改善了二元和三元情感分类,并优于多个基线。对于关系检测等更复杂的任务,在我们的实验中,只有情境化的BERT嵌入优于基线,这就需要进一步研究。最后,当可用训练数据有限时,EEG数据显示出特别有前景。