Suppr超能文献

中文 EEG:用于语义对齐和神经解码的中文语言语料库 EEG 数据集。

ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding.

机构信息

Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, China.

Centre for Cognitive and Brain Sciences, Department of Psychology, Faculty of Social Sciences, University of Macau, Taipa, Macau SAR, China.

出版信息

Sci Data. 2024 May 29;11(1):550. doi: 10.1038/s41597-024-03398-7.

Abstract

An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets featuring Chinese linguistic stimuli, we present the ChineseEEG dataset, a high-density EEG dataset complemented by simultaneous eye-tracking recordings. This dataset was compiled while 10 participants silently read approximately 13 hours of Chinese text from two well-known novels. This dataset provides long-duration EEG recordings, along with pre-processed EEG sensor-level data and semantic embeddings of reading materials extracted by a pre-trained natural language processing (NLP) model. As a pilot EEG dataset derived from natural Chinese linguistic stimuli, ChineseEEG can significantly support research across neuroscience, NLP, and linguistics. It establishes a benchmark dataset for Chinese semantic decoding, aids in the development of BCIs, and facilitates the exploration of alignment between large language models and human cognitive processes. It can also aid research into the brain's mechanisms of language processing within the context of the Chinese natural language.

摘要

利用丰富文本刺激的脑电图 (EEG) 数据集可以增进对大脑如何编码语义信息的理解,并有助于脑机接口 (BCI) 中的语义解码。为了解决缺乏以中文语言刺激为特色的 EEG 数据集的问题,我们提出了 ChineseEEG 数据集,这是一个高密度 EEG 数据集,同时配备了同步眼动记录。在这个数据集的编制过程中,10 名参与者安静地阅读了大约 13 小时的中文文本,这些文本来自两部著名的小说。该数据集提供了长时间的 EEG 记录,以及预处理的 EEG 传感器级数据和通过预训练的自然语言处理 (NLP) 模型提取的阅读材料的语义嵌入。作为一个源自自然中文语言刺激的先驱性 EEG 数据集,ChineseEEG 可以极大地支持神经科学、NLP 和语言学领域的研究。它为中文语义解码建立了基准数据集,有助于开发 BCI,并促进对大型语言模型和人类认知过程之间的对齐的探索。它还可以帮助研究在中文自然语言背景下大脑的语言处理机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c1b/11137001/4c99439aefad/41597_2024_3398_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验