Suppr超能文献

自然语言处理模型中单词表示向量的神经关联:事件相关脑电位的代表性相似性分析证据。

Neural correlates of word representation vectors in natural language processing models: Evidence from representational similarity analysis of event-related brain potentials.

机构信息

Center for Neuroscience, University of California, Davis, California, USA.

Department of Linguistics, University of California, Davis, California, USA.

出版信息

Psychophysiology. 2022 Mar;59(3):e13976. doi: 10.1111/psyp.13976. Epub 2021 Nov 24.

Abstract

Natural language processing models based on machine learning (ML-NLP models) have been developed to solve practical problems, such as interpreting an Internet search query. These models are not intended to reflect human language comprehension mechanisms, and the word representations used by ML-NLP models and human brains might therefore be quite different. However, because ML-NLP models are trained with the same kinds of inputs that humans must process, and they must solve many of the same computational problems as the human brain, ML-NLP models and human brains may end up with similar word representations. To distinguish between these hypotheses, we used representational similarity analysis to compare the representational geometry of word representations in two ML-NLP models with the representational geometry of the human brain, as indexed with event-related potentials (ERPs). Participants listened to stories while the electroencephalogram was recorded. We extracted averaged ERPs for each of the 100 words that occurred most frequently in the stories, and we calculated the similarity of the neural response for each pair of words. We compared this 100 × 100 similarity matrix to the 100 × 100 similarity matrix for the word pairs according to two ML-NLP models. We found significant representational similarity between the neural data and each ML-NLP model, beginning within 250 ms of word onset. These results indicate that ML-NLP systems that are designed to solve practical technology problems have a representational geometry that is correlated with that of the human brain, presumably because both are influenced by the structural properties and statistics of language.

摘要

基于机器学习的自然语言处理模型(ML-NLP 模型)已经被开发出来,以解决实际问题,例如解释互联网搜索查询。这些模型并不是为了反映人类语言理解机制,因此 ML-NLP 模型和人类大脑使用的词表示可能非常不同。然而,由于 ML-NLP 模型是用人类必须处理的相同类型的输入进行训练的,并且它们必须解决与人类大脑相同的计算问题,因此 ML-NLP 模型和人类大脑最终可能具有相似的词表示。为了区分这些假设,我们使用表示相似性分析来比较两个 ML-NLP 模型中的词表示的表示几何形状与事件相关电位 (ERP) 索引的人脑的表示几何形状。参与者在听故事时记录脑电图。我们提取了故事中出现频率最高的 100 个单词的平均 ERP,并计算了每个单词对的神经反应的相似性。我们将这个 100×100 相似矩阵与两个 ML-NLP 模型中词对的 100×100 相似矩阵进行了比较。我们发现,从单词出现开始的 250 毫秒内,神经数据与每个 ML-NLP 模型之间存在显著的表示相似性。这些结果表明,旨在解决实际技术问题的 ML-NLP 系统具有与人类大脑相关的表示几何形状,这可能是因为两者都受到语言结构特性和统计特性的影响。

相似文献

2
A comparison of word embeddings for the biomedical natural language processing.
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
4
The neural representation of abstract words may arise through grounding word meaning in language itself.
Hum Brain Mapp. 2021 Oct 15;42(15):4973-4984. doi: 10.1002/hbm.25593. Epub 2021 Jul 15.
5
Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context.
J Neurosci. 2019 Jan 16;39(3):519-527. doi: 10.1523/JNEUROSCI.3573-17.2018. Epub 2018 Nov 20.
6
Distinct fronto-temporal substrates of distributional and taxonomic similarity among words: evidence from RSA of BOLD signals.
Neuroimage. 2021 Jan 1;224:117408. doi: 10.1016/j.neuroimage.2020.117408. Epub 2020 Oct 10.
7
Confusion2Vec: towards enriching vector space word representations with representational ambiguities.
PeerJ Comput Sci. 2019 Jun 10;5:e195. doi: 10.7717/peerj-cs.195. eCollection 2019.
8
Event-related brain potentials in memory: correlates of episodic, semantic and implicit memory.
Clin Neurophysiol. 2003 Jun;114(6):1144-52. doi: 10.1016/s1388-2457(03)00044-0.
9
Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.
BMC Bioinformatics. 2018 Feb 5;19(1):33. doi: 10.1186/s12859-018-2039-z.
10
ERP responses to lexical-semantic processing in typically developing toddlers, in adults, and in toddlers at risk for language and learning impairment.
Neuropsychologia. 2017 Aug;103:115-130. doi: 10.1016/j.neuropsychologia.2017.06.031. Epub 2017 Jun 30.

引用本文的文献

1
Brain-model neural similarity reveals abstractive summarization performance.
Sci Rep. 2025 Jan 2;15(1):370. doi: 10.1038/s41598-024-84530-w.
2
DERCo: A Dataset for Human Behaviour in Reading Comprehension Using EEG.
Sci Data. 2024 Oct 9;11(1):1104. doi: 10.1038/s41597-024-03915-8.
4
Half-listening or zoned out? It's about the same: the impact of attentional state on word processing in context.
Cogn Neurosci. 2023 Jan-Oct;14(3):107-114. doi: 10.1080/17588928.2023.2224959. Epub 2023 Jun 22.

本文引用的文献

1
Rapid Extraction of the Spatial Distribution of Physical Saliency and Semantic Informativeness from Natural Scenes in the Human Brain.
J Neurosci. 2022 Jan 5;42(1):97-108. doi: 10.1523/JNEUROSCI.0602-21.2021. Epub 2021 Nov 8.
2
Concepts and Compositionality: In Search of the Brain's Language of Thought.
Annu Rev Psychol. 2020 Jan 4;71:273-303. doi: 10.1146/annurev-psych-122216-011829. Epub 2019 Sep 24.
3
Shared spatiotemporal category representations in biological and artificial deep neural networks.
PLoS Comput Biol. 2018 Jul 24;14(7):e1006327. doi: 10.1371/journal.pcbi.1006327. eCollection 2018 Jul.
4
I must have missed that: Alpha-band oscillations track attention to spoken language.
Neuropsychologia. 2018 Aug;117:148-155. doi: 10.1016/j.neuropsychologia.2018.05.024. Epub 2018 May 26.
5
Multivariate pattern analysis for MEG: A comparison of dissimilarity measures.
Neuroimage. 2018 Jun;173:434-447. doi: 10.1016/j.neuroimage.2018.02.044. Epub 2018 Feb 27.
7
Natural speech reveals the semantic maps that tile human cerebral cortex.
Nature. 2016 Apr 28;532(7600):453-8. doi: 10.1038/nature17637.
8
Graded expectations: Predictive processing and the adjustment of expectations during spoken language comprehension.
Cogn Affect Behav Neurosci. 2015 Sep;15(3):607-24. doi: 10.3758/s13415-015-0340-0.
9
ERPLAB: an open-source toolbox for the analysis of event-related potentials.
Front Hum Neurosci. 2014 Apr 14;8:213. doi: 10.3389/fnhum.2014.00213. eCollection 2014.
10
A toolbox for representational similarity analysis.
PLoS Comput Biol. 2014 Apr 17;10(4):e1003553. doi: 10.1371/journal.pcbi.1003553. eCollection 2014 Apr.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验