步态-CNN-ViT：基于卷积神经网络和视觉Transformer 的多模态步态识别。

Gait-CNN-ViT: Multi-Model Gait Recognition with Convolutional Neural Networks and Vision Transformer.

机构信息

Faculty of Information Science and Technology, Multimedia University, Melaka 75450, Malaysia.

Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Apr 7;23(8):3809. doi: 10.3390/s23083809.

DOI:10.3390/s23083809

PMID:37112147

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10143319/

Abstract

Gait recognition, the task of identifying an individual based on their unique walking style, can be difficult because walking styles can be influenced by external factors such as clothing, viewing angle, and carrying conditions. To address these challenges, this paper proposes a multi-model gait recognition system that integrates Convolutional Neural Networks (CNNs) and Vision Transformer. The first step in the process is to obtain a gait energy image, which is achieved by applying an averaging technique to a gait cycle. The gait energy image is then fed into three different models, DenseNet-201, VGG-16, and a Vision Transformer. These models are pre-trained and fine-tuned to encode the salient gait features that are specific to an individual's walking style. Each model provides prediction scores for the classes based on the encoded features, and these scores are then summed and averaged to produce the final class label. The performance of this multi-model gait recognition system was evaluated on three datasets, CASIA-B, OU-ISIR dataset D, and OU-ISIR Large Population dataset. The experimental results showed substantial improvement compared to existing methods on all three datasets. The integration of CNNs and ViT allows the system to learn both the pre-defined and distinct features, providing a robust solution for gait recognition even under the influence of covariates.

摘要

步态识别，即基于个体独特的行走方式进行身份识别的任务，可能具有挑战性，因为行走方式可能会受到外部因素的影响，例如穿着、视角和携带条件。为了解决这些挑战，本文提出了一种多模型步态识别系统，该系统结合了卷积神经网络（CNN）和视觉转换器（ViT）。该过程的第一步是获取步态能量图像，这是通过对步态周期应用平均技术来实现的。然后，将步态能量图像输入到三个不同的模型中，即 DenseNet-201、VGG-16 和 Vision Transformer。这些模型经过预训练和微调，以编码与个体行走方式相关的显著步态特征。每个模型根据编码特征为类别提供预测分数，然后对这些分数进行求和平均，以生成最终的类别标签。该多模型步态识别系统的性能在三个数据集上进行了评估，即 CASIA-B、OU-ISIR 数据集 D 和 OU-ISIR 大人群数据集。实验结果表明，与现有方法相比，该系统在所有三个数据集上都有显著的改进。CNN 和 ViT 的集成允许系统学习预定义和独特的特征，为步态识别提供了一个稳健的解决方案，即使在协变量的影响下也是如此。