一个受生态学启发的图像数据集，可用于深度学习，从而更好地模拟人类视觉。

An ecologically motivated image dataset for deep learning yields better models of human vision.

机构信息

MRC Cognition and Brain Sciences Unit, University of Cambridge, CB2 7EF Cambridge, United Kingdom.

Department of Psychology, Zuckerman Institute, Columbia University, New York, NY 10027.

出版信息

Proc Natl Acad Sci U S A. 2021 Feb 23;118(8). doi: 10.1073/pnas.2011417118.

DOI:10.1073/pnas.2011417118

PMID:33593900

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7923360/

Abstract

Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce , a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.

摘要

深度神经网络提供了灵长类动物大脑中视觉信息处理的当前最佳模型。借鉴计算机视觉的研究成果，最常用的网络是在 ImageNet 大规模视觉识别挑战赛的数据上进行预训练的。该数据集包含来自 1000 个类别的图像，旨在为自动化视觉对象识别系统提供一个具有挑战性的测试平台。超越这一常见做法，我们引入了一个由超过 150 万张来自 565 个基本类别图像组成的数据集，旨在更好地捕捉与人类相关的物体分布。Ecoset 类别是根据其在语言使用中的频率和具体性来选择的，从而反映了世界上重要的物理物体。我们使用两种神经网络架构（AlexNet 和 vNet）的多个实例来测试在这个更具生态有效性的数据集上进行训练的效果，vNet 是一种新的架构，旨在模拟人类腹侧流中感受野大小的逐渐增加。我们发现，在 Ecoset 上进行训练可以显著提高对人类高级视觉皮层和感知判断的表示预测，超过了以前的艺术水平。这两种架构在两个独立的功能磁共振成像 (fMRI) 数据集和行为数据上都表现出显著和高度一致的优势，共同涵盖了来自各种物体类别的 1292 个视觉刺激的反应。这些结果表明，计算视觉神经科学可以通过使用反映人类感知和认知体验的图像集，更好地利用深度学习框架。Ecoset 和训练后的网络模型可供研究界公开使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62ed/7923360/bdecc5da288a/pnas.2011417118fig01.jpg

相似文献

An ecologically motivated image dataset for deep learning yields better models of human vision.

Proc Natl Acad Sci U S A. 2021 Feb 23;118(8). doi: 10.1073/pnas.2011417118.

Factorized visual representations in the primate visual system and deep neural networks.

Elife. 2024 Jul 5;13:RP91685. doi: 10.7554/eLife.91685.

Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks.

J Neurosci. 2018 Aug 15;38(33):7255-7269. doi: 10.1523/JNEUROSCI.0388-18.2018. Epub 2018 Jul 13.

Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics.

J Neurosci. 2023 Mar 8;43(10):1731-1741. doi: 10.1523/JNEUROSCI.1424-22.2022. Epub 2023 Feb 9.

Visual Object Recognition: Do We (Finally) Know More Now Than We Did?

Annu Rev Vis Sci. 2016 Oct 14;2:377-396. doi: 10.1146/annurev-vision-111815-114621. Epub 2016 Aug 3.

The Ventral Visual Pathway Represents Animal Appearance over Animacy, Unlike Human Behavior and Deep Neural Networks.

J Neurosci. 2019 Aug 14;39(33):6513-6525. doi: 10.1523/JNEUROSCI.1714-18.2019. Epub 2019 Jun 13.

Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex.

Sci Rep. 2020 Feb 12;10(1):2453. doi: 10.1038/s41598-020-59175-0.

Nat Commun. 2021 Mar 25;12(1):1872. doi: 10.1038/s41467-021-22078-3.

Atoms of recognition in human and computer vision.

Proc Natl Acad Sci U S A. 2016 Mar 8;113(10):2744-9. doi: 10.1073/pnas.1513198113. Epub 2016 Feb 16.

Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway.

Sci Rep. 2018 Jul 13;8(1):10636. doi: 10.1038/s41598-018-28865-1.

引用本文的文献

High-level visual representations in the human brain are aligned with large language models.

Nat Mach Intell. 2025;7(8):1220-1234. doi: 10.1038/s42256-025-01072-0. Epub 2025 Aug 7.

Temporal asymmetry of neural pattern similarity predicts recognition memory decisions.

Commun Biol. 2025 Jul 31;8(1):1138. doi: 10.1038/s42003-025-08569-9.

Fast and robust visual object recognition in young children.

Sci Adv. 2025 Jul 4;11(27):eads6821. doi: 10.1126/sciadv.ads6821. Epub 2025 Jul 2.

End-to-end topographic networks as models of cortical map formation and human visual behaviour.

Nat Hum Behav. 2025 Jun 6. doi: 10.1038/s41562-025-02220-7.

Approaches to understanding natural behavior.

J Vis. 2025 May 1;25(6):12. doi: 10.1167/jov.25.6.12.

A large annotated dataset of vocalizations by common marmosets.

Sci Data. 2025 May 13;12(1):782. doi: 10.1038/s41597-025-04951-8.

Brain feature maps reveal progressive animal-feature representations in the ventral stream.

Sci Adv. 2025 Apr 25;11(17):eadq7342. doi: 10.1126/sciadv.adq7342.

Core dimensions of human material perception.

Proc Natl Acad Sci U S A. 2025 Mar 11;122(10):e2417202122. doi: 10.1073/pnas.2417202122. Epub 2025 Mar 5.

RTify: Aligning Deep Neural Networks with Human Behavioral Decisions.

ArXiv. 2024 Dec 26:arXiv:2411.03630v2.

A computational deep learning investigation of animacy perception in the human brain.

Commun Biol. 2024 Dec 31;7(1):1718. doi: 10.1038/s42003-024-07415-8.

本文引用的文献

SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infant's Perspective.

Open Mind (Camb). 2021 May 26;5:20-29. doi: 10.1162/opmi_a_00039. eCollection 2021.

Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting.

J Cogn Neurosci. 2021 Sep 1;33(10):2044-2064. doi: 10.1162/jocn_a_01755.

Unsupervised learning predicts human perception and misperception of gloss.

Nat Hum Behav. 2021 Oct;5(10):1402-1417. doi: 10.1038/s41562-021-01097-6. Epub 2021 May 6.

Unsupervised neural network models of the ventral visual stream.

Proc Natl Acad Sci U S A. 2021 Jan 19;118(3). doi: 10.1073/pnas.2014196118.

Going in circles is the way forward: the role of recurrence in visual inference.

Curr Opin Neurobiol. 2020 Dec;65:176-193. doi: 10.1016/j.conb.2020.11.009. Epub 2020 Dec 3.

Individual differences among deep neural network models.

Nat Commun. 2020 Nov 12;11(1):5725. doi: 10.1038/s41467-020-19632-w.

Revealing the multidimensional mental representations of natural objects underlying human similarity judgements.

Nat Hum Behav. 2020 Nov;4(11):1173-1185. doi: 10.1038/s41562-020-00951-3. Epub 2020 Oct 12.

Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision.

PLoS Comput Biol. 2020 Oct 2;16(10):e1008215. doi: 10.1371/journal.pcbi.1008215. eCollection 2020 Oct.

A map of object space in primate inferotemporal cortex.

Nature. 2020 Jul;583(7814):103-108. doi: 10.1038/s41586-020-2350-5. Epub 2020 Jun 3.

The infant's face diet: Data on 3-month-old infant-perspective experience with faces video-recorded in their typical, daily environment.

Data Brief. 2019 Dec 31;29:105070. doi: 10.1016/j.dib.2019.105070. eCollection 2020 Apr.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一个受生态学启发的图像数据集，可用于深度学习，从而更好地模拟人类视觉。

An ecologically motivated image dataset for deep learning yields better models of human vision.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献