• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多模态变换器的脑编码模型可以跨语言和视觉进行迁移。

Brain encoding models based on multimodal transformers can transfer across language and vision.

作者信息

Tang Jerry, Du Meng, Vo Vy A, Lal Vasudev, Huth Alexander G

机构信息

UT Austin.

Intel Labs, UCLA.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36:29654-29666.

PMID:39015152
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11250991/
Abstract

Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts in language and vision. In this work, we used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies. We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality, particularly in cortical regions that represent conceptual meaning. Further analysis of these encoding models revealed shared semantic dimensions that underlie concept representations in language and vision. Comparing encoding models trained using representations from multimodal and unimodal transformers, we found that multimodal transformers learn more aligned representations of concepts in language and vision. Our results demonstrate how multimodal transformers can provide insights into the brain's capacity for multimodal processing.

摘要

编码模型已被用于评估人类大脑如何在语言和视觉中表征概念。虽然语言和视觉依赖于相似的概念表征,但当前的编码模型通常是在对每种模态的大脑反应进行单独训练和测试的。多模态预训练的最新进展产生了能够提取语言和视觉中概念的对齐表征的变换器。在这项工作中,我们使用多模态变换器的表征来训练编码模型,这些模型可以跨功能磁共振成像对故事和电影的反应进行迁移。我们发现,基于对一种模态的大脑反应训练的编码模型能够成功预测对另一种模态的大脑反应,特别是在表征概念意义的皮层区域。对这些编码模型的进一步分析揭示了语言和视觉中概念表征所基于的共享语义维度。比较使用多模态和单模态变换器的表征训练的编码模型,我们发现多模态变换器学习到语言和视觉中概念的更对齐的表征。我们的结果证明了多模态变换器如何能够为大脑的多模态处理能力提供见解。

相似文献

1
Brain encoding models based on multimodal transformers can transfer across language and vision.基于多模态变换器的脑编码模型可以跨语言和视觉进行迁移。
Adv Neural Inf Process Syst. 2023 Dec;36:29654-29666.
2
Revealing Vision-Language Integration in the Brain with Multimodal Networks.利用多模态网络揭示大脑中的视觉-语言整合
ArXiv. 2024 Jun 20:arXiv:2406.14481v1.
3
What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么:语义信息对视觉表征的影响。
Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.
4
Do it the transformer way: A comprehensive review of brain and vision transformers for autism spectrum disorder diagnosis and classification.采用变压器方法:自闭症谱系障碍诊断和分类的脑和视觉变压器的全面综述。
Comput Biol Med. 2023 Dec;167:107667. doi: 10.1016/j.compbiomed.2023.107667. Epub 2023 Nov 3.
5
Transformers bridge vision and language to estimate and understand scene meaning.Transformer架构将视觉与语言联系起来,以估计和理解场景含义。
Res Sq. 2023 May 29:rs.3.rs-2968381. doi: 10.21203/rs.3.rs-2968381/v1.
6
Deep Artificial Neural Networks Reveal a Distributed Cortical Network Encoding Propositional Sentence-Level Meaning.深度人工神经网络揭示命题句级意义的分布式皮层网络编码。
J Neurosci. 2021 May 5;41(18):4100-4119. doi: 10.1523/JNEUROSCI.1152-20.2021. Epub 2021 Mar 22.
7
The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality.在听和读两种模式下,人类大脑皮层对语义信息的表示在刺激方式上是不变的。
J Neurosci. 2019 Sep 25;39(39):7722-7736. doi: 10.1523/JNEUROSCI.0675-19.2019. Epub 2019 Aug 19.
8
Heteromodal Cortical Areas Encode Sensory-Motor Features of Word Meaning.异模态皮层区域编码词义的感觉运动特征。
J Neurosci. 2016 Sep 21;36(38):9763-9. doi: 10.1523/JNEUROSCI.4095-15.2016.
9
Visual and Affective Multimodal Models of Word Meaning in Language and Mind.语言与思维中的词意视觉与情感多模态模型。
Cogn Sci. 2021 Jan;45(1):e12922. doi: 10.1111/cogs.12922.
10
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset.VALOR:视听语言全感知预训练模型与数据集
IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):708-724. doi: 10.1109/TPAMI.2024.3479776. Epub 2025 Jan 9.

引用本文的文献

1
High-level visual representations in the human brain are aligned with large language models.人类大脑中的高级视觉表征与大语言模型相一致。
Nat Mach Intell. 2025;7(8):1220-1234. doi: 10.1038/s42256-025-01072-0. Epub 2025 Aug 7.
2
Multi-voxel pattern analysis for developmental cognitive neuroscientists.面向发育认知神经科学家的多体素模式分析
Dev Cogn Neurosci. 2025 Mar 25;73:101555. doi: 10.1016/j.dcn.2025.101555.
3
Semantic language decoding across participants and stimulus modalities.跨参与者和刺激模态的语义语言解码。

本文引用的文献

1
Semantic reconstruction of continuous language from non-invasive brain recordings.从非侵入性脑记录中重建连续语言的语义。
Nat Neurosci. 2023 May;26(5):858-866. doi: 10.1038/s41593-023-01304-9. Epub 2023 May 1.
2
Shared computational principles for language processing in humans and deep language models.人类和深度语言模型语言处理的共享计算原则。
Nat Neurosci. 2022 Mar;25(3):369-380. doi: 10.1038/s41593-022-01026-4. Epub 2022 Mar 7.
3
Brains and algorithms partially converge in natural language processing.大脑和算法在自然语言处理中部分融合。
Curr Biol. 2025 Mar 10;35(5):1023-1032.e6. doi: 10.1016/j.cub.2025.01.024. Epub 2025 Feb 6.
4
Multisensory naturalistic decoding with high-density diffuse optical tomography.利用高密度扩散光学断层扫描进行多感官自然主义解码。
Neurophotonics. 2025 Jan;12(1):015002. doi: 10.1117/1.NPh.12.1.015002. Epub 2025 Jan 23.
5
A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.大规模考察在大脑和机器中塑造高级视觉表示的归纳偏差。
Nat Commun. 2024 Oct 30;15(1):9383. doi: 10.1038/s41467-024-53147-y.
6
A functional account of stimulation-based aerobic glycolysis and its role in interpreting BOLD signal intensity increases in neuroimaging experiments.一种基于刺激的有氧糖酵解的功能解释及其在神经影像学实验中解释 BOLD 信号强度增加的作用。
Neurosci Biobehav Rev. 2023 Oct;153:105373. doi: 10.1016/j.neubiorev.2023.105373. Epub 2023 Aug 25.
7
The Multiscale Surface Vision Transformer.多尺度表面视觉变换器
ArXiv. 2024 Jun 11:arXiv:2303.11909v3.
Commun Biol. 2022 Feb 16;5(1):134. doi: 10.1038/s42003-022-03036-1.
4
A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.一个用于连接认知神经科学与人工智能的大规模7T功能磁共振成像数据集。
Nat Neurosci. 2022 Jan;25(1):116-126. doi: 10.1038/s41593-021-00962-x. Epub 2021 Dec 16.
5
The neural architecture of language: Integrative modeling converges on predictive processing.语言的神经结构:综合建模趋向于预测处理。
Proc Natl Acad Sci U S A. 2021 Nov 9;118(45). doi: 10.1073/pnas.2105646118.
6
Voxelwise Encoding Models Show That Cerebellar Language Representations Are Highly Conceptual.体素编码模型显示小脑语言表现高度具有概念性。
J Neurosci. 2021 Dec 15;41(50):10341-10355. doi: 10.1523/JNEUROSCI.0118-21.2021. Epub 2021 Nov 3.
7
Visual and linguistic semantic representations are aligned at the border of human visual cortex.视觉和语言语义表示在人类视觉皮层的边界处对齐。
Nat Neurosci. 2021 Nov;24(11):1628-1636. doi: 10.1038/s41593-021-00921-6. Epub 2021 Oct 28.
8
Visual Organization of the Default Network.默认网络的可视化组织。
Cereb Cortex. 2020 May 18;30(6):3518-3527. doi: 10.1093/cercor/bhz323.
9
The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality.在听和读两种模式下,人类大脑皮层对语义信息的表示在刺激方式上是不变的。
J Neurosci. 2019 Sep 25;39(39):7722-7736. doi: 10.1523/JNEUROSCI.0675-19.2019. Epub 2019 Aug 19.
10
Voxelwise encoding models with non-spherical multivariate normal priors.体素编码模型,具有非球形多元正态先验。
Neuroimage. 2019 Aug 15;197:482-492. doi: 10.1016/j.neuroimage.2019.04.012. Epub 2019 May 7.