IEEE Trans Image Process. 2018 Jul;27(7):3275-3287. doi: 10.1109/TIP.2018.2819820.
Vehicle re-identification (re-ID) is an area that has received far less attention in the computer vision community than the prevalent person re-ID. Possible reasons for this slow progress are the lack of appropriate research data and the special 3D structure of a vehicle. Previous works have generally focused on some specific views (e.g., front); but, these methods are less effective in realistic scenarios, where vehicles usually appear in arbitrary views to cameras. In this paper, we focus on the uncertainty of vehicle viewpoint in re-ID, proposing two end-to-end deep architectures: the Spatially Concatenated ConvNet and convolutional neural network (CNN)-LSTM bi-directional loop. Our models exploit the great advantages of the CNN and long short-term memory (LSTM) to learn transformations across different viewpoints of vehicles. Thus, a multi-view vehicle representation containing all viewpoints' information can be inferred from the only one input view, and then used for learning to measure distance. To verify our models, we also introduce a Toy Car RE-ID data set with images from multiple viewpoints of 200 vehicles. We evaluate our proposed methods on the Toy Car RE-ID data set and the public Multi-View Car, VehicleID, and VeRi data sets. Experimental results illustrate that our models achieve consistent improvements over the state-of-the-art vehicle re-ID approaches.
车辆再识别(re-ID)在计算机视觉领域受到的关注远不及流行的人员 re-ID。进展缓慢的可能原因是缺乏适当的研究数据和车辆的特殊 3D 结构。以前的工作通常集中在某些特定的视图(例如,正面);但是,这些方法在现实场景中效果较差,因为车辆通常以任意视角出现在相机中。在本文中,我们专注于 re-ID 中车辆视角的不确定性,提出了两种端到端的深度架构:空间连接卷积网络和卷积神经网络(CNN)-LSTM 双向循环。我们的模型利用 CNN 和长短期记忆(LSTM)的巨大优势来学习车辆不同视角之间的变换。因此,可以从仅一个输入视图推断出包含所有视图信息的多视图车辆表示,然后用于学习测量距离。为了验证我们的模型,我们还引入了一个具有来自 200 辆车的多个视角的图像的玩具车再识别数据集。我们在玩具车再识别数据集以及公共多视图汽车、车辆 ID 和 Veri 数据集上评估了我们提出的方法。实验结果表明,我们的模型在车辆再识别方面取得了一致的改进,优于最新技术。