US Research Center, Sony Electronics, Inc., San Jose, CA 95112, USA.
IEEE Trans Pattern Anal Mach Intell. 2011 Mar;33(3):514-30. doi: 10.1109/TPAMI.2010.117.
Object detection is challenging when the object class exhibits large within-class variations. In this work, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions. Model training is accomplished via standard SVM learning. When the foreground object masks are provided in training, the detectors can also produce object segmentations. A tracking-by-detection framework to recover foreground state in video sequences is also proposed with our model. The advantages of our method are demonstrated on tasks of object detection, view angle estimation, and tracking. Our approach compares favorably to existing methods on hand and vehicle detection tasks. Quantitative tracking results are given on sequences of moving vehicles and human faces.
当目标类表现出较大的类内变化时,目标检测具有挑战性。在这项工作中,我们表明,前景-背景分类(检测)和前景类的类内分类(姿势估计)可以以两个核函数的乘积形式联合学习。模型训练通过标准 SVM 学习完成。当在训练中提供前景对象遮罩时,检测器还可以生成对象分割。我们还提出了一种基于检测的跟踪框架,用于在视频序列中恢复前景状态。我们的方法在目标检测、视角估计和跟踪任务上的优势得到了验证。我们的方法在手和车辆检测任务上优于现有的方法。在移动车辆和人脸的序列上给出了定量跟踪结果。