Bai Yanru, Tang Qi, Zhao Ran, Liu Hongxing, Zhang Shuming, Guo Mingkun, Guo Minghan, Wang Junjie, Wang Changjian, Xing Mu, Ni Guangjian, Ming Dong
Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China.
Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin, 300072, China.
Sci Data. 2025 Apr 25;12(1):701. doi: 10.1038/s41597-025-05036-2.
Semantic understanding is central to advanced cognitive functions, and the mechanisms by which the brain processes language information are still being explored. Existing EEG datasets often lack natural reading data specific to Chinese, limiting research on Chinese semantic decoding and natural language processing. This study aims to construct a Chinese natural reading EEG dataset, TMNRED, for semantic target identification in natural reading environments. TMNRED was collected from 30 participants reading sentences sourced from public internet resources and media reports. Each participant underwent 400-450 trials in a single day, resulting in a dataset with over 10 hours of continuous EEG data and more than 4000 trials. This dataset provides valuable physiological data for studying Chinese semantics and developing more accurate Chinese natural language processing models.
语义理解是高级认知功能的核心,大脑处理语言信息的机制仍在探索之中。现有的脑电图数据集往往缺乏特定的中文自然阅读数据,限制了对中文语义解码和自然语言处理的研究。本研究旨在构建一个中文自然阅读脑电图数据集TMNRED,用于自然阅读环境中的语义目标识别。TMNRED是从30名阅读来自公共互联网资源和媒体报道的句子的参与者中收集的。每位参与者在一天内进行了400 - 450次试验,从而得到了一个包含超过10小时连续脑电图数据和4000多次试验的数据集。该数据集为研究中文语义和开发更精确的中文自然语言处理模型提供了有价值的生理数据。