Department of Psychology, University of Amsterdam, Amsterdam, Netherlands.
Amsterdam Brain & Cognition (ABC), University of Amsterdam, Amsterdam, Netherlands.
PLoS Comput Biol. 2023 Jun 9;19(6):e1011169. doi: 10.1371/journal.pcbi.1011169. eCollection 2023 Jun.
Humans can quickly recognize objects in a dynamically changing world. This ability is showcased by the fact that observers succeed at recognizing objects in rapidly changing image sequences, at up to 13 ms/image. To date, the mechanisms that govern dynamic object recognition remain poorly understood. Here, we developed deep learning models for dynamic recognition and compared different computational mechanisms, contrasting feedforward and recurrent, single-image and sequential processing as well as different forms of adaptation. We found that only models that integrate images sequentially via lateral recurrence mirrored human performance (N = 36) and were predictive of trial-by-trial responses across image durations (13-80 ms/image). Importantly, models with sequential lateral-recurrent integration also captured how human performance changes as a function of image presentation durations, with models processing images for a few time steps capturing human object recognition at shorter presentation durations and models processing images for more time steps capturing human object recognition at longer presentation durations. Furthermore, augmenting such a recurrent model with adaptation markedly improved dynamic recognition performance and accelerated its representational dynamics, thereby predicting human trial-by-trial responses using fewer processing resources. Together, these findings provide new insights into the mechanisms rendering object recognition so fast and effective in a dynamic visual world.
人类能够快速识别动态变化世界中的物体。这一能力体现在观察者能够成功识别快速变化的图像序列中的物体,其速度可达每幅图像 13 毫秒。迄今为止,动态物体识别的机制仍未被充分理解。在这里,我们开发了用于动态识别的深度学习模型,并比较了不同的计算机制,包括前馈和循环、单图像和序列处理以及不同形式的适应。我们发现,只有通过侧向循环序列地整合图像的模型才能模拟人类的表现(N=36),并且能够预测跨图像持续时间的逐次反应(13-80 毫秒/图像)。重要的是,具有序列侧向循环整合的模型还捕捉到了人类表现如何随图像呈现持续时间而变化,其中处理几个时间步长的模型在较短的呈现持续时间内捕获人类物体识别,而处理更多时间步长的模型在较长的呈现持续时间内捕获人类物体识别。此外,通过适应来增强这种循环模型显著提高了动态识别性能,并加速了其表示动态,从而使用更少的处理资源来预测人类逐次反应。总之,这些发现为在动态视觉世界中使物体识别如此快速和有效的机制提供了新的见解。