Suppr超能文献

基于多模态变换器的脑编码模型可以跨语言和视觉进行迁移。

Brain encoding models based on multimodal transformers can transfer across language and vision.

作者信息

Tang Jerry, Du Meng, Vo Vy A, Lal Vasudev, Huth Alexander G

机构信息

UT Austin.

Intel Labs, UCLA.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36:29654-29666.

Abstract

Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts in language and vision. In this work, we used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies. We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality, particularly in cortical regions that represent conceptual meaning. Further analysis of these encoding models revealed shared semantic dimensions that underlie concept representations in language and vision. Comparing encoding models trained using representations from multimodal and unimodal transformers, we found that multimodal transformers learn more aligned representations of concepts in language and vision. Our results demonstrate how multimodal transformers can provide insights into the brain's capacity for multimodal processing.

摘要

编码模型已被用于评估人类大脑如何在语言和视觉中表征概念。虽然语言和视觉依赖于相似的概念表征,但当前的编码模型通常是在对每种模态的大脑反应进行单独训练和测试的。多模态预训练的最新进展产生了能够提取语言和视觉中概念的对齐表征的变换器。在这项工作中,我们使用多模态变换器的表征来训练编码模型,这些模型可以跨功能磁共振成像对故事和电影的反应进行迁移。我们发现,基于对一种模态的大脑反应训练的编码模型能够成功预测对另一种模态的大脑反应,特别是在表征概念意义的皮层区域。对这些编码模型的进一步分析揭示了语言和视觉中概念表征所基于的共享语义维度。比较使用多模态和单模态变换器的表征训练的编码模型,我们发现多模态变换器学习到语言和视觉中概念的更对齐的表征。我们的结果证明了多模态变换器如何能够为大脑的多模态处理能力提供见解。

相似文献

10
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset.VALOR:视听语言全感知预训练模型与数据集
IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):708-724. doi: 10.1109/TPAMI.2024.3479776. Epub 2025 Jan 9.

引用本文的文献

3
Semantic language decoding across participants and stimulus modalities.跨参与者和刺激模态的语义语言解码。
Curr Biol. 2025 Mar 10;35(5):1023-1032.e6. doi: 10.1016/j.cub.2025.01.024. Epub 2025 Feb 6.

本文引用的文献

8
Visual Organization of the Default Network.默认网络的可视化组织。
Cereb Cortex. 2020 May 18;30(6):3518-3527. doi: 10.1093/cercor/bhz323.
10
Voxelwise encoding models with non-spherical multivariate normal priors.体素编码模型,具有非球形多元正态先验。
Neuroimage. 2019 Aug 15;197:482-492. doi: 10.1016/j.neuroimage.2019.04.012. Epub 2019 May 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验