Stanford University, Stanford, CA 94305, U.S.A.
MIT, Cambridge, MA 02139, U.S.A.
Neural Comput. 2022 Jul 14;34(8):1652-1675. doi: 10.1162/neco_a_01506.
The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway. Properly chosen intermediate-depth ConvRNN circuit architectures, which incorporate mechanisms of feedforward bypassing and recurrent gating, can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then develop methods that allow us to compare both CNNs and ConvRNNs to finely grained measurements of primate categorization behavior and neural response trajectories across thousands of stimuli. We find that high-performing ConvRNNs provide a better match to these data than feedforward networks of any depth, predicting the precise timings at which each stimulus is behaviorally decoded from neural activation patterns. Moreover, these ConvRNN circuits consistently produce quantitatively accurate predictions of neural dynamics from V4 and IT across the entire stimulus presentation. In fact, we find that the highest-performing ConvRNNs, which best match neural and behavioral data, also achieve a strong Pareto trade-off between task performance and overall network size. Taken together, our results suggest the functional purpose of recurrence in the ventral pathway is to fit a high-performing network in cortex, attaining computational power through temporal rather than spatial complexity.
丰富的反馈连接在腹侧视觉流中的计算作用尚不清楚,使人类和非人类灵长类动物能够毫不费力地在多种观察条件下识别物体。先前的研究通过增加递归连接来增强前馈卷积神经网络(CNN),以研究它们在视觉处理中的作用;然而,这些递归网络通常直接在神经数据上进行优化,或者缺乏这些连接的标准前馈网络使用的比较指标未定义。在这项工作中,我们开发了任务优化的卷积递归(ConvRNN)网络模型,更准确地模拟了腹侧通路的时间和大体神经解剖结构。适当选择的中间深度 ConvRNN 电路结构,结合前馈旁路和递归门控机制,可以在核心识别任务上实现高性能,与更深的前馈网络相当。然后,我们开发了一些方法,使我们能够将 CNN 和 ConvRNN 与灵长类分类行为的精细粒度测量以及对数千个刺激的神经反应轨迹进行比较。我们发现,高性能的 ConvRNN 比任何深度的前馈网络都更能匹配这些数据,预测出从神经激活模式中对每个刺激进行行为解码的精确时间。此外,这些 ConvRNN 电路一致地从 V4 和 IT 产生对整个刺激呈现过程中神经动力学的定量准确预测。实际上,我们发现,性能最高的 ConvRNN 能够很好地匹配神经和行为数据,也在任务性能和整体网络大小之间实现了强大的 Pareto 权衡。总之,我们的结果表明,腹侧通路中递归的功能目的是在皮质中拟合一个高性能的网络,通过时间而不是空间复杂性来获得计算能力。