Suppr超能文献

从深度生成图像模型中进行约束采样揭示了人类目标检测的机制。

Constrained sampling from deep generative image models reveals mechanisms of human target detection.

机构信息

Department of Psychology, Centre for Vision Research & Vision: Science to Application, York University, Toronto, ON, Canada.

出版信息

J Vis. 2020 Jul 1;20(7):32. doi: 10.1167/jov.20.7.32.

Abstract

The first steps of visual processing are often described as a bank of oriented filters followed by divisive normalization. This approach has been tremendously successful at predicting contrast thresholds in simple visual displays. However, it is unclear to what extent this kind of architecture also supports processing in more complex visual tasks performed in naturally looking images. We used a deep generative image model to embed arc segments with different curvatures in naturalistic images. These images contain the target as part of the image scene, resulting in considerable appearance variation of target as well as background. Three observers localized arc targets in these images, with an average accuracy of 74.7%. Data were fit by several biologically inspired models, four standard deep convolutional neural networks (CNNs), and a five-layer CNN specifically trained for this task. Four models predicted observer responses particularly well; (1) a bank of oriented filters, similar to complex cells in primate area V1; (2) a bank of oriented filters followed by tuned gain control, incorporating knowledge about cortical surround interactions; (3) a bank of oriented filters followed by local normalization; and (4) the five-layer CNN. A control experiment with optimized stimuli based on these four models showed that the observers' data were best explained by model (2) with tuned gain control. These data suggest that standard models of early vision provide good descriptions of performance in much more complex tasks than what they were designed for, while general-purpose non linear models such as convolutional neural networks do not.

摘要

视觉处理的第一步通常被描述为一组定向滤波器,然后是除法归一化。这种方法在预测简单视觉显示器中的对比度阈值方面取得了巨大成功。然而,目前尚不清楚这种架构在处理更复杂的视觉任务时,在自然外观的图像中能在多大程度上支持处理。我们使用深度生成图像模型将具有不同曲率的弧形段嵌入自然图像中。这些图像将目标作为图像场景的一部分包含在内,导致目标和背景的外观变化相当大。三名观察者在这些图像中定位弧形目标,平均准确率为 74.7%。数据由几种生物启发模型、四个标准深度卷积神经网络(CNN)以及专门为此任务训练的五层 CNN 进行拟合。四个模型特别准确地预测了观察者的反应:(1)一组定向滤波器,类似于灵长类动物 V1 区的复杂细胞;(2)一组定向滤波器,然后是调谐增益控制,包含有关皮质周围相互作用的知识;(3)一组定向滤波器,然后是局部归一化;(4)五层 CNN。基于这四个模型的优化刺激的对照实验表明,观察者的数据最好由具有调谐增益控制的模型 (2) 来解释。这些数据表明,早期视觉的标准模型提供了对更复杂任务表现的良好描述,而通用非线性模型(如卷积神经网络)则没有。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc98/7424951/979c74ef120c/jovi-20-7-32-f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验