IEEE Trans Image Process. 2014 Apr;23(4):1765-78. doi: 10.1109/TIP.2014.2307480.
Detecting generic object categories in images and videos are a fundamental issue in computer vision. However, it faces the challenges from inter and intraclass diversity, as well as distortions caused by viewpoints, poses, deformations, and so on. To solve object variations, this paper constructs a structure kernel and proposes a multiscale part-based model incorporating the discriminative power of kernels. The structure kernel would measure the resemblance of part-based objects in three aspects: 1) the global similarity term to measure the resemblance of the global visual appearance of relevant objects; 2) the part similarity term to measure the resemblance of the visual appearance of distinctive parts; and 3) the spatial similarity term to measure the resemblance of the spatial layout of parts. In essence, the deformation of parts in the structure kernel is penalized in a multiscale space with respect to horizontal displacement, vertical displacement, and scale difference. Part similarities are combined with different weights, which are optimized efficiently to maximize the intraclass similarities and minimize the interclass similarities by the normalized stochastic gradient ascent algorithm. In addition, the parameters of the structure kernel are learned during the training process with regard to the distribution of the data in a more discriminative way. With flexible part sizes on scale and displacement, it can be more robust to the intraclass variations, poses, and viewpoints. Theoretical analysis and experimental evaluations demonstrate that the proposed multiscale part-based representation model with structure kernel exhibits accurate and robust performance, and outperforms state-of-the-art object classification approaches.
在计算机视觉中,检测图像和视频中的通用目标类别是一个基本问题。然而,它面临着来自类内和类间多样性的挑战,以及由于视角、姿势、变形等引起的失真。为了解决物体的变化,本文构建了一个结构核,并提出了一种多尺度基于部件的模型,该模型结合了核的判别能力。结构核将从三个方面度量基于部件的物体的相似性:1)全局相似性项,用于度量相关物体的全局视觉外观的相似性;2)部件相似性项,用于度量有区别的部件的视觉外观的相似性;3)空间相似性项,用于度量部件的空间布局的相似性。本质上,结构核中的部件变形在水平位移、垂直位移和尺度差异的多尺度空间中受到惩罚。部件相似性用不同的权重进行组合,这些权重通过归一化随机梯度上升算法进行优化,以最大化类内相似度和最小化类间相似度。此外,结构核的参数在训练过程中是根据数据的分布以更具判别力的方式进行学习的。具有灵活的部件大小和位移,可以更好地抵抗类内变化、姿势和视角的影响。理论分析和实验评估表明,所提出的基于结构核的多尺度基于部件的表示模型具有准确和鲁棒的性能,优于最先进的目标分类方法。