IEEE Trans Pattern Anal Mach Intell. 2016 Mar;38(3):546-62. doi: 10.1109/TPAMI.2015.2453950.
One shot, generic object detection involves searching for a single query object in a larger target image. Relevant approaches have benefited from features that typically model the local similarity patterns. In this paper, we combine local similarity (encoded by local descriptors) with a global context (i.e., a graph structure) of pairwise affinities among the local descriptors, embedding the query descriptors into a low dimensional but discriminatory subspace. Unlike principal components that preserve global structure of feature space, we actually seek a linear approximation to the Laplacian eigenmap that permits us a locality preserving embedding of high dimensional region descriptors. Our second contribution is an accelerated but exact computation of matrix cosine similarity as the decision rule for detection, obviating the computationally expensive sliding window search. We leverage the power of Fourier transform combined with integral image to achieve superior runtime efficiency that allows us to test multiple hypotheses (for pose estimation) within a reasonably short time. Our approach to one shot detection is training-free, and experiments on the standard data sets confirm the efficacy of our model. Besides, low computation cost of the proposed (codebook-free) object detector facilitates rather straightforward query detection in large data sets including movie videos.
单次拍摄的通用目标检测涉及在较大的目标图像中搜索单个查询对象。相关方法受益于通常用于建模局部相似性模式的特征。在本文中,我们将局部相似性(由局部描述符编码)与局部描述符之间的成对相似性的全局上下文(即图结构)相结合,将查询描述符嵌入到低维但具有辨别力的子空间中。与保留特征空间全局结构的主成分不同,我们实际上寻求拉普拉斯特征映射的线性逼近,这允许我们对高维区域描述符进行保局部嵌入。我们的第二个贡献是作为检测决策规则的矩阵余弦相似度的加速但精确计算,避免了计算成本高昂的滑动窗口搜索。我们利用傅立叶变换与积分图像的强大功能,实现了卓越的运行时效率,从而能够在相当短的时间内测试多个假设(用于姿态估计)。我们的单次拍摄检测方法是无训练的,在标准数据集上的实验证实了我们模型的有效性。此外,所提出的(无代码本)目标检测器的低计算成本使得在包括电影视频在内的大型数据集上进行简单直接的查询检测变得更加容易。