Department of Psychology, Harvard University, Cambridge, MA, USA.
Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA.
J Vis. 2021 May 3;21(5):3. doi: 10.1167/jov.21.5.3.
The vision sciences literature contains a large diversity of experimental and theoretical approaches to the study of visual attention. We argue that this diversity arises, at least in part, from the field's inability to unify differing theoretical perspectives. In particular, the field has been hindered by a lack of a principled formal framework for simultaneously thinking about both optimal attentional processing and capacity-limited attentional processing, where capacity is limited in a general, task-independent manner. Here, we supply such a framework based on rate-distortion theory (RDT) and optimal lossy compression. Our approach defines Bayes-optimal performance when an upper limit on information processing rate is imposed. In this article, we compare Bayesian and RDT accounts in both uncued and cued visual search tasks. We start by highlighting a typical shortcoming of unlimited-capacity Bayesian models that is not shared by RDT models, namely, that they often overestimate task performance when information-processing demands are increased. Next, we reexamine data from two cued-search experiments that have previously been modeled as the result of unlimited-capacity Bayesian inference and demonstrate that they can just as easily be explained as the result of optimal lossy compression. To model cued visual search, we introduce the concept of a "conditional communication channel." This simple extension generalizes the lossy-compression framework such that it can, in principle, predict optimal attentional-shift behavior in any kind of perceptual task, even when inputs to the model are raw sensory data such as image pixels. To demonstrate this idea's viability, we compare our idealized model of cued search, which operates on a simplified abstraction of the stimulus, to a deep neural network version that performs approximately optimal lossy compression on the real (pixel-level) experimental stimuli.
视觉科学文献中包含大量用于研究视觉注意的实验和理论方法。我们认为,这种多样性至少部分源于该领域无法统一不同的理论视角。具体来说,该领域受到缺乏一个原则性的正式框架的阻碍,无法同时思考最佳注意处理和容量有限的注意处理,其中容量以一般的、与任务无关的方式受到限制。在这里,我们基于率失真理论 (RDT) 和最优有损压缩提供了这样一个框架。我们的方法在对信息处理速率施加上限时定义了贝叶斯最优性能。在本文中,我们在无提示和提示视觉搜索任务中比较了贝叶斯和 RDT 方法。我们首先强调了无限制容量贝叶斯模型的一个典型缺点,即当信息处理需求增加时,它们通常会高估任务性能,而 RDT 模型则没有这个缺点。接下来,我们重新检查了两个先前被建模为无限制容量贝叶斯推理结果的提示搜索实验的数据,并证明它们也可以很容易地被解释为最优有损压缩的结果。为了建模提示视觉搜索,我们引入了“条件通信信道”的概念。这个简单的扩展使有损压缩框架具有一般性,原则上可以预测任何类型的感知任务中的最佳注意转移行为,即使模型的输入是原始感觉数据,如图像像素。为了证明这个想法的可行性,我们将我们的提示搜索理想化模型与一个深度神经网络版本进行了比较,该模型对真实(像素级)实验刺激进行了近似最优的有损压缩。