Fadaei Amir Hosein, Dehaqani Mohammad-Reza A
College of Engineering, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.
School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
Sci Rep. 2024 Jul 4;14(1):15366. doi: 10.1038/s41598-024-66346-w.
Traditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-understanding models with spatiotemporal features has been limited. Drawing inspiration from natural vision, which exhibits remarkable resilience to input changes, our research focuses on the development of a brain-inspired model for vision understanding trained with videos. Our findings demonstrate that models that train on videos instead of still images and include temporal features become more resilient to various alternations on input media.
传统上,视觉模型主要依赖于从静态图像中提取的空间特征,这与大脑在自然视觉中处理的连续时空特征流有所不同。虽然已经出现了许多视频理解模型,但将具有时空特征的视频纳入图像理解模型的情况仍然有限。受自然视觉的启发,自然视觉对输入变化具有显著的适应性,我们的研究重点是开发一种受大脑启发的视觉理解模型,并使用视频进行训练。我们的研究结果表明,与基于静态图像训练且不包含时间特征的模型相比,基于视频训练且包含时间特征的模型对输入媒体的各种变化具有更强的适应性。