Suppr
超能文献

大规模考察在大脑和机器中塑造高级视觉表示的归纳偏差。

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.

机构信息

Department of Psychology, Harvard University, Cambridge, MA, USA.

Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA.

出版信息

Nat Commun. 2024 Oct 30;15(1):9383. doi: 10.1038/s41467-024-53147-y.

DOI:10.1038/s41467-024-53147-y

PMID:39477923

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11526138/

Abstract

The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.

摘要

高性能计算机视觉模型的快速发布为研究不同归纳偏差对学习表示的新兴大脑对齐的影响提供了新的潜力。在这里，我们对一组经过精心挑选的 224 个不同模型进行了对照比较，以测试特定模型属性对视觉大脑预测性的影响——这一过程需要超过 18 亿次回归和 50300 次表示相似性分析。我们发现，当其他因素保持不变时，具有定性不同架构（例如卷积神经网络与变压器）和任务目标（例如纯粹的视觉对比学习与视觉语言对齐）的模型可以实现近乎等效的大脑预测性。相反，视觉训练方案的变化对大脑预测性产生最大、最一致的影响。尽管模型的基础表示存在明显差异，但许多模型仍能达到相似的高大脑预测性，这表明用于将模型与大脑联系起来的标准方法可能过于灵活。总的来说，这些发现挑战了新兴大脑对齐背后因素的常见假设，并概述了我们如何利用受控模型比较来探究生物和人工视觉系统背后的共同计算原理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aadc/11526138/32f4222a8a1f/41467_2024_53147_Fig1_HTML.jpg

相似文献

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.

Nat Commun. 2024 Oct 30;15(1):9383. doi: 10.1038/s41467-024-53147-y.

Invariant recognition drives neural representations of action sequences.

PLoS Comput Biol. 2017 Dec 18;13(12):e1005859. doi: 10.1371/journal.pcbi.1005859. eCollection 2017 Dec.

Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain.

PLoS Comput Biol. 2024 May 6;20(5):e1012058. doi: 10.1371/journal.pcbi.1012058. eCollection 2024 May.

Factorized visual representations in the primate visual system and deep neural networks.

Elife. 2024 Jul 5;13:RP91685. doi: 10.7554/eLife.91685.

Manipulating and measuring variation in deep neural network (DNN) representations of objects.

Cognition. 2024 Nov;252:105920. doi: 10.1016/j.cognition.2024.105920. Epub 2024 Aug 19.

An ecologically motivated image dataset for deep learning yields better models of human vision.

Proc Natl Acad Sci U S A. 2021 Feb 23;118(8). doi: 10.1073/pnas.2011417118.

Limits to visual representational correspondence between convolutional neural networks and the human brain.

Nat Commun. 2021 Apr 6;12(1):2065. doi: 10.1038/s41467-021-22244-7.

Probing the link between vision and language in material perception using psychophysics and unsupervised learning.

PLoS Comput Biol. 2024 Oct 3;20(10):e1012481. doi: 10.1371/journal.pcbi.1012481. eCollection 2024 Oct.

Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks.

Neuroimage. 2022 Nov;263:119635. doi: 10.1016/j.neuroimage.2022.119635. Epub 2022 Sep 15.

Enhancing neural encoding models for naturalistic perception with a multi-level integration of deep neural networks and cortical networks.

Sci Bull (Beijing). 2024 Jun 15;69(11):1738-1747. doi: 10.1016/j.scib.2024.02.035. Epub 2024 Feb 29.

引用本文的文献

High-level visual representations in the human brain are aligned with large language models.

Nat Mach Intell. 2025;7(8):1220-1234. doi: 10.1038/s42256-025-01072-0. Epub 2025 Aug 7.

Universal dimensions of visual representation.

Sci Adv. 2025 Jul 4;11(27):eadw7697. doi: 10.1126/sciadv.adw7697. Epub 2025 Jul 2.

A simplified minimodel of visual cortical neurons.

Nat Commun. 2025 Jul 1;16(1):5724. doi: 10.1038/s41467-025-61171-9.

Dimensions underlying the representational alignment of deep neural networks with humans.

Nat Mach Intell. 2025;7(6):848-859. doi: 10.1038/s42256-025-01041-7. Epub 2025 Jun 23.

Representation of locomotive action affordances in human behavior, brains, and deep neural networks.

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2414005122. doi: 10.1073/pnas.2414005122. Epub 2025 Jun 12.

Quantifying the roles of visual, linguistic, and visual-linguistic complexity in noun and verb acquisition.

PLoS One. 2025 May 23;20(5):e0321973. doi: 10.1371/journal.pone.0321973. eCollection 2025.

Net2Brain: a toolbox to compare artificial vision models with human brain responses.

Front Neuroinform. 2025 May 6;19:1515873. doi: 10.3389/fninf.2025.1515873. eCollection 2025.

Unsupervised alignment reveals structural commonalities and differences in neural representations of natural scenes across individuals and brain areas.

iScience. 2025 Apr 15;28(5):112427. doi: 10.1016/j.isci.2025.112427. eCollection 2025 May 16.

Contrastive-Equivariant Self-Supervised Learning Improves Alignment with Primate Visual Area IT.

Adv Neural Inf Process Syst. 2024;37:96045-96070.

Approximating Human-Level 3D Visual Inferences With Deep Neural Networks.

Open Mind (Camb). 2025 Feb 16;9:305-324. doi: 10.1162/opmi_a_00189. eCollection 2025.

本文引用的文献

Contrastive learning explains the emergence and function of visual category-selective regions.

Sci Adv. 2024 Sep 27;10(39):eadl1776. doi: 10.1126/sciadv.adl1776. Epub 2024 Sep 25.

Digital Twin Studies for Reverse Engineering the Origins of Visual Intelligence.

Annu Rev Vis Sci. 2024 Sep;10(1):145-170. doi: 10.1146/annurev-vision-101322-103628.

Brain encoding models based on multimodal transformers can transfer across language and vision.

Adv Neural Inf Process Syst. 2023 Dec;36:29654-29666.

How well do models of visual cortex generalize to out of distribution samples?

PLoS Comput Biol. 2024 May 31;20(5):e1011145. doi: 10.1371/journal.pcbi.1011145. eCollection 2024 May.

Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks.

PLoS Comput Biol. 2024 May 23;20(5):e1012056. doi: 10.1371/journal.pcbi.1012056. eCollection 2024 May.

Grounded language acquisition through the eyes and ears of a single child.

Science. 2024 Feb 2;383(6682):504-511. doi: 10.1126/science.adi1374. Epub 2024 Feb 1.

High-performing neural network models of visual cortex benefit from high latent dimensionality.

PLoS Comput Biol. 2024 Jan 10;20(1):e1011792. doi: 10.1371/journal.pcbi.1011792. eCollection 2024 Jan.

Generalized Shape Metrics on Neural Representations.

Adv Neural Inf Process Syst. 2021 Dec;34:4738-4750.

Model metamers reveal divergent invariances between biological and artificial neural networks.

Nat Neurosci. 2023 Nov;26(11):2017-2034. doi: 10.1038/s41593-023-01442-0. Epub 2023 Oct 16.

Cortical topographic motifs emerge in a self-organized map of object space.

Sci Adv. 2023 Jun 23;9(25):eade8187. doi: 10.1126/sciadv.ade8187. Epub 2023 Jun 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

大规模考察在大脑和机器中塑造高级视觉表示的归纳偏差。

A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译