Gu Jialiang, Yi Yang, Li Qiang
Computer Science and Engineering, Sun Yat-sen University, Guangdong, China.
Front Neurosci. 2024 Mar 25;18:1370024. doi: 10.3389/fnins.2024.1370024. eCollection 2024.
Spatial-temporal modeling is crucial for action recognition in videos within the field of artificial intelligence. However, robustly extracting motion information remains a primary challenge due to temporal deformations of appearances and variations in motion frequencies between different actions. In order to address these issues, we propose an innovative and effective method called the Motion Sensitive Network (MSN), incorporating the theories of artificial neural networks and key concepts of autonomous system control and decision-making. Specifically, we employ an approach known as Spatial-Temporal Pyramid Motion Extraction (STP-ME) module, adjusting convolution kernel sizes and time intervals synchronously to gather motion information at different temporal scales, aligning with the learning and prediction characteristics of artificial neural networks. Additionally, we introduce a new module called Variable Scale Motion Excitation (DS-ME), utilizing a differential model to capture motion information in resonance with the flexibility of autonomous system control. Particularly, we employ a multi-scale deformable convolutional network to alter the motion scale of the target object before computing temporal differences across consecutive frames, providing theoretical support for the flexibility of autonomous systems. Temporal modeling is a crucial step in understanding environmental changes and actions within autonomous systems, and MSN, by integrating the advantages of Artificial Neural Networks (ANN) in this task, provides an effective framework for the future utilization of artificial neural networks in autonomous systems. We evaluate our proposed method on three challenging action recognition datasets (Kinetics-400, Something-Something V1, and Something-Something V2). The results indicate an improvement in accuracy ranging from 1.1% to 2.2% on the test set. When compared with state-of-the-art (SOTA) methods, the proposed approach achieves a maximum performance of 89.90%. In ablation experiments, the performance gain of this module also shows an increase ranging from 2% to 5.3%. The introduced Motion Sensitive Network (MSN) demonstrates significant potential in various challenging scenarios, providing an initial exploration into integrating artificial neural networks into the domain of autonomous systems.
时空建模对于人工智能领域视频中的动作识别至关重要。然而,由于外观的时间变形以及不同动作之间运动频率的变化,稳健地提取运动信息仍然是一个主要挑战。为了解决这些问题,我们提出了一种创新且有效的方法,称为运动敏感网络(MSN),它融合了人工神经网络的理论以及自主系统控制与决策的关键概念。具体而言,我们采用一种称为时空金字塔运动提取(STP - ME)模块的方法,同步调整卷积核大小和时间间隔,以在不同时间尺度上收集运动信息,这与人工神经网络的学习和预测特性相契合。此外,我们引入了一个名为可变尺度运动激励(DS - ME)的新模块,利用微分模型来捕捉与自主系统控制灵活性相共鸣的运动信息。特别地,我们采用多尺度可变形卷积网络在计算连续帧之间的时间差异之前改变目标对象的运动尺度,为自主系统的灵活性提供理论支持。时间建模是理解自主系统内环境变化和动作的关键步骤,而MSN通过在这项任务中整合人工神经网络(ANN)的优势,为未来在自主系统中利用人工神经网络提供了一个有效的框架。我们在三个具有挑战性的动作识别数据集(Kinetics - 400、Something - Something V1和Something - Something V2)上评估了我们提出的方法。结果表明,在测试集上准确率提高了1.1%至2.2%。与当前最先进(SOTA)方法相比,所提出的方法实现了89.90%的最高性能。在消融实验中,该模块的性能提升也显示出从2%到5.3%的增长。所引入的运动敏感网络(MSN)在各种具有挑战性的场景中展现出巨大潜力,为将人工神经网络集成到自主系统领域提供了初步探索。