Yu Jun, Zhu Chaoyang, Zhang Jian, Huang Qingming, Tao Dacheng
IEEE Trans Neural Netw Learn Syst. 2020 Feb;31(2):661-674. doi: 10.1109/TNNLS.2019.2908982. Epub 2019 Apr 26.
We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.
我们提出了一种基于新型深度神经网络的端到端地点识别模型。首先,我们建议利用图像的空间金字塔结构来增强局部聚合描述符(VLAD)向量,以使增强后的VLAD特征能够反映图像的结构信息。为了将这种特征提取编码到深度学习方法中,我们构建了一个空间金字塔增强VLAD(SPE-VLAD)层。接下来,我们对传统三元组损失(T-loss)函数的项施加权重约束,以使加权三元组损失(WT-loss)函数避免学习过程的次优收敛。该损失函数在弱监督场景下能够很好地工作,因为它不仅通过GPS标签,还通过图像表示之间的欧几里得距离来确定每个查询的语义正样本和负样本。SPE-VLAD层和WT-loss层与VGG-16网络或ResNet-18网络集成,形成一个新型的端到端深度神经网络,该网络可以通过标准反向传播方法轻松训练。我们在三个基准数据集上进行了实验,结果表明所提出的模型击败了应用于地点识别的当前最先进的深度学习方法。