IEEE Trans Image Process. 2016 May;25(5):2353-67. doi: 10.1109/TIP.2016.2545929.
This paper proposes a novel approach to person re-identification, a fundamental task in distributed multi-camera surveillance systems. Although a variety of powerful algorithms have been presented in the past few years, most of them usually focus on designing hand-crafted features and learning metrics either individually or sequentially. Different from previous works, we formulate a unified deep ranking framework that jointly tackles both of these key components to maximize their strengths. We start from the principle that the correct match of the probe image should be positioned in the top rank within the whole gallery set. An effective learning-to-rank algorithm is proposed to minimize the cost corresponding to the ranking disorders of the gallery. The ranking model is solved with a deep convolutional neural network (CNN) that builds the relation between input image pairs and their similarity scores through joint representation learning directly from raw image pixels. The proposed framework allows us to get rid of feature engineering and does not rely on any assumption. An extensive comparative evaluation is given, demonstrating that our approach significantly outperforms all the state-of-the-art approaches, including both traditional and CNN-based methods on the challenging VIPeR, CUHK-01, and CAVIAR4REID datasets. In addition, our approach has better ability to generalize across datasets without fine-tuning.
本文提出了一种新颖的人员再识别方法,这是分布式多摄像机监控系统中的一个基本任务。尽管过去几年提出了各种强大的算法,但它们大多数通常侧重于单独或顺序设计手工制作的特征和学习指标。与以前的工作不同,我们制定了一个统一的深度排序框架,联合解决这两个关键组件,以最大限度地发挥它们的优势。我们从这样一个原则出发,即探针图像的正确匹配应该在整个图库集中的最高排名中定位。我们提出了一种有效的学习排序算法,以最小化对应于图库排序紊乱的代价。排序模型是通过深度卷积神经网络(CNN)解决的,该网络通过直接从原始图像像素联合表示学习来建立输入图像对及其相似性得分之间的关系。所提出的框架使我们能够摆脱特征工程,并且不依赖于任何假设。进行了广泛的比较评估,结果表明,我们的方法在具有挑战性的 VIPeR、CUHK-01 和 CAVIAR4REID 数据集上,明显优于所有最先进的方法,包括传统方法和基于 CNN 的方法。此外,我们的方法具有更好的跨数据集泛化能力,无需微调。