IEEE Trans Image Process. 2017 Mar;26(3):1253-1263. doi: 10.1109/TIP.2017.2651367. Epub 2017 Jan 10.
Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image, that are more likely to contain the object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. We propose to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences. Different from tracking we only require a previous frame to improve both proposal and detection: no prediction based on local motion is performed, thus avoiding tracking errors. We obtain three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.
目标检测是计算机视觉中最重要的任务之一。它通常通过评估图像中可能包含感兴趣对象的位置的子集来完成。现在,已经用目标提议方法取代了详尽的方法。尽管这对于视频序列中的目标检测来说是一个非常相关的问题,但检测器和提议算法之间的相互作用尚未得到充分分析和利用。我们建议在利用视频序列有序和连续性质的闭环中连接检测器和目标提议生成器函数。与跟踪不同,我们只需要上一帧来改进提议和检测:不执行基于局部运动的预测,从而避免跟踪错误。我们在 mAP 上获得了三到四个百分点的提高,并且检测时间低于具有 CNN 特征的更快区域(R-CNN),这是目前已知的最快的基于卷积神经网络(CNN)的通用目标检测器。