IEEE Trans Neural Netw Learn Syst. 2019 Dec;30(12):3847-3852. doi: 10.1109/TNNLS.2019.2899588. Epub 2019 Mar 12.
Video-based person re-identification (re-id) matches two tracks of persons from different cameras. Features are extracted from the images of a sequence and then aggregated as a track feature. Compared to existing works that aggregate frame features by simply averaging them or using temporal models such as recurrent neural networks, we propose an intelligent feature aggregate method based on reinforcement learning. Specifically, we train an agent to determine which frames in the sequence should be abandoned in the aggregation, which can be treated as a decision making process. By this way, the proposed method avoids introducing noisy information of the sequence and retains these valuable frames when generating a track feature. On benchmark data sets, experimental results show that our method can boost the re-id accuracy obviously based on the state-of-the-art models.
基于视频的行人再识别(re-id)将来自不同摄像机的两个行人轨迹进行匹配。从序列的图像中提取特征,然后聚合为轨迹特征。与现有通过简单平均或使用循环神经网络等时间模型来聚合帧特征的工作相比,我们提出了一种基于强化学习的智能特征聚合方法。具体来说,我们训练一个代理来确定序列中哪些帧应该在聚合过程中丢弃,这可以看作是一个决策过程。通过这种方式,所提出的方法避免了引入序列中的噪声信息,并在生成轨迹特征时保留了这些有价值的帧。在基准数据集上的实验结果表明,我们的方法可以在基于最先进模型的基础上显著提高 re-id 的准确性。