Li Sheng, Shao Ming, Fu Yun
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2963-2977. doi: 10.1109/TPAMI.2017.2764893. Epub 2017 Oct 26.
Person re-identification plays an important role in many safety-critical applications. Existing works mainly focus on extracting patch-level features or learning distance metrics. However, the representation power of extracted features might be limited, due to the various viewing conditions of pedestrian images in complex real-world scenarios. To improve the representation power of features, we learn discriminative and robust representations via dictionary learning in this paper. First, we propose a Cross-view Dictionary Learning (CDL) model, which is a general solution to the multi-view learning problem. Inspired by the dictionary learning based domain adaptation, CDL learns a pair of dictionaries from two views. In particular, CDL adopts a projective learning strategy, which is more efficient than the optimization in traditional dictionary learning. Second, we propose a Cross-view Multi-level Dictionary Learning (CMDL) approach based on CDL. CMDL contains dictionary learning models at different representation levels, including image-level, horizontal part-level, and patch-level. The proposed models take advantages of the view-consistency information, and adaptively learn pairs of dictionaries to generate robust and compact representations for pedestrian images. Third, we incorporate a discriminative regularization term to CMDL, and propose a CMDL-Dis approach which learns pairs of discriminative dictionaries in image-level and part-level. We devise efficient optimization algorithms to solve the proposed models. Finally, a fusion strategy is utilized to generate the similarity scores for test images. Experiments on the public VIPeR, CUHK Campus, iLIDS, GRID and PRID450S datasets show that our approach achieves the state-of-the-art performance.
行人重识别在许多安全关键型应用中发挥着重要作用。现有工作主要集中在提取补丁级特征或学习距离度量。然而,由于复杂现实场景中行人图像的各种观看条件,提取特征的表示能力可能有限。为了提高特征的表示能力,我们在本文中通过字典学习来学习判别性和鲁棒性表示。首先,我们提出了一种跨视图字典学习(CDL)模型,它是多视图学习问题的通用解决方案。受基于字典学习的域适应启发,CDL从两个视图中学习一对字典。具体而言,CDL采用投影学习策略,这比传统字典学习中的优化更有效。其次,我们基于CDL提出了一种跨视图多级字典学习(CMDL)方法。CMDL包含不同表示级别的字典学习模型,包括图像级、水平部分级和补丁级。所提出的模型利用视图一致性信息,并自适应地学习字典对,以为行人图像生成鲁棒且紧凑的表示。第三,我们将判别正则化项纳入CMDL,并提出了一种CMDL-Dis方法,该方法在图像级和部分级学习判别字典对。我们设计了高效的优化算法来求解所提出的模型。最后,利用融合策略生成测试图像的相似度分数。在公共VIPeR数据集、香港中文大学校园数据集、iLIDS数据集、GRID数据集和PRID450S数据集上的实验表明,我们的方法取得了当前最优的性能。