Department of Psychology, Harvard University, Cambridge, MA, USA.
Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA.
Nat Commun. 2024 Oct 30;15(1):9383. doi: 10.1038/s41467-024-53147-y.
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
高性能计算机视觉模型的快速发布为研究不同归纳偏差对学习表示的新兴大脑对齐的影响提供了新的潜力。在这里,我们对一组经过精心挑选的 224 个不同模型进行了对照比较,以测试特定模型属性对视觉大脑预测性的影响——这一过程需要超过 18 亿次回归和 50300 次表示相似性分析。我们发现,当其他因素保持不变时,具有定性不同架构(例如卷积神经网络与变压器)和任务目标(例如纯粹的视觉对比学习与视觉语言对齐)的模型可以实现近乎等效的大脑预测性。相反,视觉训练方案的变化对大脑预测性产生最大、最一致的影响。尽管模型的基础表示存在明显差异,但许多模型仍能达到相似的高大脑预测性,这表明用于将模型与大脑联系起来的标准方法可能过于灵活。总的来说,这些发现挑战了新兴大脑对齐背后因素的常见假设,并概述了我们如何利用受控模型比较来探究生物和人工视觉系统背后的共同计算原理。