IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1089-1102. doi: 10.1109/TPAMI.2016.2567386. Epub 2016 May 12.
Cross-domain visual data matching is one of the fundamental problems in many real-world vision tasks, e.g., matching persons across ID photos and surveillance videos. Conventional approaches to this problem usually involves two steps: i) projecting samples from different domains into a common space, and ii) computing (dis-)similarity in this space based on a certain distance. In this paper, we present a novel pairwise similarity measure that advances existing models by i) expanding traditional linear projections into affine transformations and ii) fusing affine Mahalanobis distance and Cosine similarity by a data-driven combination. Moreover, we unify our similarity measure with feature representation learning via deep convolutional neural networks. Specifically, we incorporate the similarity measure matrix into the deep architecture, enabling an end-to-end way of model optimization. We extensively evaluate our generalized similarity model in several challenging cross-domain matching tasks: person re-identification under different views and face verification over different modalities (i.e., faces from still images and videos, older and younger faces, and sketch and photo portraits). The experimental results demonstrate superior performance of our model over other state-of-the-art methods.
跨领域视觉数据匹配是许多现实世界视觉任务中的基本问题之一,例如在 ID 照片和监控视频中匹配人员。该问题的传统方法通常涉及两个步骤:i)将不同领域的样本投影到公共空间中,ii)基于某种距离在该空间中计算(不)相似性。在本文中,我们提出了一种新颖的成对相似性度量标准,通过 i)将传统线性投影扩展为仿射变换,ii)通过数据驱动的组合融合仿射 Mahalanobis 距离和余弦相似度,从而改进了现有模型。此外,我们通过深度卷积神经网络将相似性度量标准与特征表示学习统一起来。具体来说,我们将相似性度量矩阵纳入深度架构中,实现了模型优化的端到端方式。我们在几个具有挑战性的跨领域匹配任务中广泛评估了我们的广义相似性模型:不同视角下的人员重新识别和不同模态下的人脸验证(即来自静态图像和视频的人脸、年长和年轻的人脸、草图和照片肖像)。实验结果表明,我们的模型在其他最先进的方法上表现出优越的性能。