用于基于视频的行人重识别的多级融合时空协同注意力

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification.

作者信息

Pei Shengyu, Fan Xiaoping

机构信息

School of Automation, Central South University, Changsha 410075, China.

School of Information Technology and Management, Hunan University of Finance and Economics, Changsha 410205, China.

出版信息

Entropy (Basel). 2021 Dec 15;23(12):1686. doi: 10.3390/e23121686.

DOI:10.3390/e23121686

PMID:34945992

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8700156/

Abstract

A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collection, which is time-consuming and laborious. Every video sequence frame has a different degree of similarity. In this paper, multi-level fusion temporal-spatial co-attention is adopted to improve person re-identification (reID). For a small dataset, the improved network can better prevent over-fitting and reduce the dataset limit. Specifically, the concept of knowledge evolution is introduced into video-based person re-identification to improve the backbone residual neural network (ResNet). The global branch, local branch, and attention branch are used in parallel for feature extraction. Three high-level features are embedded in the metric learning network to improve the network's generalization ability and the accuracy of video-based person re-identification. Simulation experiments are implemented on small datasets PRID2011 and iLIDS-VID, and the improved network can better prevent over-fitting. Experiments are also implemented on MARS and DukeMTMC-VideoReID, and the proposed method can be used to extract more feature information and improve the network's generalization ability. The results show that our method achieves better performance. The model achieves 90.15% Rank1 and 81.91% mAP on MARS.

摘要

卷积神经网络可能会因数据不足而轻易陷入局部最小值，且所需的训练不稳定。当前许多方法通过添加行人属性、行人姿态等辅助信息来解决这些问题，但这需要额外收集，既耗时又费力。每个视频序列帧都有不同程度的相似性。本文采用多级融合时空协同注意力来改进行人重识别（reID）。对于小数据集，改进后的网络能更好地防止过拟合并减少数据集限制。具体而言，将知识进化的概念引入基于视频的行人重识别，以改进骨干残差神经网络（ResNet）。全局分支、局部分支和注意力分支并行用于特征提取。将三个高级特征嵌入度量学习网络，以提高网络的泛化能力和基于视频的行人重识别的准确率。在小数据集PRID2011和iLIDS-VID上进行了仿真实验，改进后的网络能更好地防止过拟合。还在MARS和DukeMTMC-VideoReID上进行了实验，所提方法可用于提取更多特征信息并提高网络的泛化能力。结果表明，我们的方法取得了更好的性能。该模型在MARS上达到了90.15%的Rank1和81.91%的mAP。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于基于视频的行人重识别的多级融合时空协同注意力

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

用于基于视频的行人重识别的多级融合时空协同注意力

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献