Suppr超能文献

词汇语义内容而非句法结构,是语言网络中功能磁共振成像反应的人工神经网络与大脑相似性的主要促成因素。

Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network.

作者信息

Kauf Carina, Tuckute Greta, Levy Roger, Andreas Jacob, Fedorenko Evelina

机构信息

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.

McGovern Institute for Brain Research, Massachusetts Institute of Technology.

出版信息

bioRxiv. 2023 May 6:2023.05.05.539646. doi: 10.1101/2023.05.05.539646.

Abstract

Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI dataset of responses to n=627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we i) perturbed sentences' word order, ii) removed different subsets of words, or iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical semantic content of the sentence (largely carried by content words) rather than the sentence's syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN's embedding space and decrease the ANN's ability to predict upcoming tokens in those stimuli. Further, results are robust to whether the mapping model is trained on intact or perturbed stimuli, and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result-that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones-aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.

摘要

人工神经网络(ANN)语言模型的表征已被证明能够预测语言网络中的人类大脑活动。为了了解语言刺激的哪些方面促成了人工神经网络与大脑的相似性,我们使用了一个功能磁共振成像(fMRI)数据集,该数据集记录了对n = 627个自然英语句子的反应(佩雷拉等人,2018年),并系统地操纵了从中提取人工神经网络表征的刺激。具体而言,我们:i)打乱句子的词序;ii)删除不同的词子集;或iii)用具有不同语义相似性的其他句子替换原句。我们发现,句子的词汇语义内容(主要由实词承载)而非句子的句法形式(通过词序或虚词传达)是人工神经网络与大脑相似性的主要原因。在后续分析中,我们发现对大脑预测性有不利影响的扰动操作也会导致人工神经网络嵌入空间中的表征更加分散,并降低人工神经网络预测这些刺激中后续词元的能力。此外,无论映射模型是在完整刺激还是扰动刺激上进行训练,以及人工神经网络句子表征是否以人类所看到的相同语言上下文为条件,结果都是稳健的。关键结果——词汇语义内容是人工神经网络表征与神经表征之间相似性的主要贡献因素——与人类语言系统的目标是从语言字符串中提取意义这一观点一致。最后,这项工作凸显了系统实验操作在评估我们距离人类语言网络的准确且通用模型有多近方面的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b844/10187317/4b96da6b7947/nihpp-2023.05.05.539646v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验