IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4460-4474. doi: 10.1109/TNNLS.2020.3017939. Epub 2021 Oct 5.
Person reidentification (reID) by convolutional neural network (CNN)-based networks has achieved favorable performance in recent years. However, most of existing CNN-based methods do not take full advantage of spatial-temporal context modeling. In fact, the global spatial-temporal context can greatly clarify local distractions to enhance the target feature representation. To comprehensively leverage the spatial-temporal context information, in this work, we present a novel block, interaction-aggregation-update (IAU), for high-performance person reID. First, the spatial-temporal IAU (STIAU) module is introduced. STIAU jointly incorporates two types of contextual interactions into a CNN framework for target feature learning. Here, the spatial interactions learn to compute the contextual dependencies between different body parts of a single frame, while the temporal interactions are used to capture the contextual dependencies between the same body parts across all frames. Furthermore, a channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts. Therefore, the IAU block enables the feature to incorporate the globally spatial, temporal, and channel context. It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet. The experiments show that IAUnet performs favorably against state of the art on both image and video reID tasks and achieves compelling results on a general object categorization task. The source code is available at https://github.com/blue-blue272/ImgReID-IAnet.
基于卷积神经网络 (CNN) 的人物重识别 (reID) 在近年来取得了良好的性能。然而,现有的大多数基于 CNN 的方法并没有充分利用时空上下文建模。事实上,全局时空上下文可以极大地澄清局部干扰,从而增强目标特征表示。为了全面利用时空上下文信息,在这项工作中,我们提出了一种新颖的块,即交互聚合更新 (IAU),用于高性能的人物 reID。首先,引入了时空 IAU (STIAU) 模块。STIAU 将两种类型的上下文交互联合纳入 CNN 框架中,以学习目标特征。这里,空间交互用于学习计算单个帧中不同身体部位之间的上下文依赖关系,而时间交互用于捕获所有帧中相同身体部位之间的上下文依赖关系。此外,设计了一个通道 IAU (CIAU) 模块来建模通道特征之间的语义上下文交互,以增强特征表示,特别是对于小尺度的视觉线索和身体部位。因此,IAU 块使特征能够融合全局空间、时间和通道上下文。它轻量级、端到端可训练,并且可以轻松地插入到现有的 CNN 中,形成 IAUnet。实验表明,IAUnet 在图像和视频 reID 任务上均优于现有技术,并且在一般对象分类任务上也取得了引人注目的结果。源代码可在 https://github.com/blue-blue272/ImgReID-IAnet 上获得。