IEEE Trans Image Process. 2017 Jul;26(7):3492-3506. doi: 10.1109/TIP.2017.2700762. Epub 2017 May 3.
Person re-identification across disjoint camera views has been widely applied in video surveillance yet it is still a challenging problem. One of the major challenges lies in the lack of spatial and temporal cues, which makes it difficult to deal with large variations of lighting conditions, viewing angles, body poses, and occlusions. Recently, several deep-learning-based person re-identification approaches have been proposed and achieved remarkable performance. However, most of those approaches extract discriminative features from the whole frame at one glimpse without differentiating various parts of the persons to identify. It is essentially important to examine multiple highly discriminative local regions of the person images in details through multiple glimpses for dealing with the large appearance variance. In this paper, we propose a new soft attention-based model, i.e., the end-to-end comparative attention network (CAN), specifically tailored for the task of person re-identification. The end-to-end CAN learns to selectively focus on parts of pairs of person images after taking a few glimpses of them and adaptively comparing their appearance. The CAN model is able to learn which parts of images are relevant for discerning persons and automatically integrates information from different parts to determine whether a pair of images belongs to the same person. In other words, our proposed CAN model simulates the human perception process to verify whether two images are from the same person. Extensive experiments on four benchmark person re-identification data sets, including CUHK01, CHUHK03, Market-1501, and VIPeR, clearly demonstrate that our proposed end-to-end CAN for person re-identification outperforms well established baselines significantly and offer the new state-of-the-art performance.
跨不相交摄像机视图的人员重新识别已广泛应用于视频监控,但它仍然是一个具有挑战性的问题。主要挑战之一在于缺乏空间和时间线索,这使得处理光照条件、视角、身体姿势和遮挡的大变化变得困难。最近,已经提出了几种基于深度学习的人员重新识别方法,并取得了显著的性能。然而,大多数这些方法从整帧中提取判别特征,而不区分人员的各个部分来识别。通过多次观察详细检查人员图像的多个高度判别局部区域,对于处理大的外观变化至关重要。在本文中,我们提出了一种新的基于软注意力的模型,即端到端对比注意力网络(CAN),专门用于人员重新识别任务。端到端 CAN 学会在对它们进行几次观察后有选择地关注人员对图像的部分,并自适应地比较它们的外观。CAN 模型能够学习哪些部分的图像对于区分人员是相关的,并自动整合来自不同部分的信息以确定一对图像是否属于同一个人。换句话说,我们提出的 CAN 模型模拟了人类的感知过程,以验证两幅图像是否来自同一个人。在包括 CUHK01、CHUHK03、Market-1501 和 VIPeR 在内的四个基准人员重新识别数据集上进行的广泛实验清楚地表明,我们提出的用于人员重新识别的端到端 CAN 明显优于既定的基准,并提供了新的最先进的性能。