School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.
School of Information Technology, Shanghai Jianqiao University, Shanghai 201306, China.
Sensors (Basel). 2020 Feb 3;20(3):811. doi: 10.3390/s20030811.
As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue of unsatisfactory performance of existing pedestrian attribute recognition methods resulting from ignoring the correlation between pedestrian attributes and spatial information, in this paper, the task is regarded as a spatiotemporal, sequential, multi-label image classification problem. An attention-based neural network consisting of convolutional neural networks (CNN), channel attention (CAtt) and convolutional long short-term memory (ConvLSTM) is proposed (CNN-CAtt-ConvLSTM). Firstly, the salient and correlated visual features of pedestrian attributes are extracted by pre-trained CNN and CAtt. Then, ConvLSTM is used to further extract spatial information and correlations from pedestrian attributes. Finally, pedestrian attributes are predicted with optimized sequences based on attribute image area size and importance. Extensive experiments are carried out on two common pedestrian attribute datasets, PEdesTrian Attribute (PETA) dataset and Richly Annotated Pedestrian (RAP) dataset, and higher performance than other state-of-the-art (SOTA) methods is achieved, which proves the superiority and validity of our method.
由于其在视频监控中的重要作用,行人属性识别已成为计算机视觉研究的一个吸引人的方面。由于视角、光照、分辨率和遮挡的变化,这项任务极具挑战性。为了解决现有行人属性识别方法因忽略行人属性与空间信息之间的相关性而导致性能不佳的问题,本文将该任务视为时空、顺序、多标签图像分类问题。提出了一种基于注意力的神经网络,该网络由卷积神经网络(CNN)、通道注意力(CAtt)和卷积长短期记忆(ConvLSTM)组成(CNN-CAtt-ConvLSTM)。首先,通过预训练的 CNN 和 CAtt 提取行人属性的显著相关视觉特征。然后,使用 ConvLSTM 进一步从行人属性中提取空间信息和相关性。最后,根据属性图像区域大小和重要性,基于优化序列预测行人属性。在两个常见的行人属性数据集,PEdesTrian 属性(PETA)数据集和 Richly Annotated Pedestrian(RAP)数据集上进行了广泛的实验,与其他最先进(SOTA)方法相比取得了更高的性能,证明了我们方法的优越性和有效性。