用于视觉跟踪的迭代多边界框细化

Iterative Multiple Bounding-Box Refinements for Visual Tracking.

作者信息

Cruciata Giorgio, Lo Presti Liliana, La Cascia Marco

机构信息

Dipartimento di Ingegneria, University of Palermo, 90128 Palermo, Italy.

出版信息

J Imaging. 2022 Mar 3;8(3):61. doi: 10.3390/jimaging8030061.

DOI:10.3390/jimaging8030061

PMID:35324616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8955588/

Abstract

Single-object visual tracking aims at locating a target in each video frame by predicting the bounding box of the object. Recent approaches have adopted iterative procedures to gradually refine the bounding box and locate the target in the image. In such approaches, the deep model takes as input the image patch corresponding to the currently estimated target bounding box, and provides as output the probability associated with each of the possible bounding box refinements, generally defined as a discrete set of linear transformations of the bounding box center and size. At each iteration, only one transformation is applied, and supervised training of the model may introduce an inherent ambiguity by giving importance priority to some transformations over the others. This paper proposes a novel formulation of the problem of selecting the bounding box refinement. It introduces the concept of non-conflicting transformations and allows applying multiple refinements to the target bounding box at each iteration without introducing ambiguities during learning of the model parameters. Empirical results demonstrate that the proposed approach improves the iterative single refinement in terms of accuracy and precision of the tracking results.

摘要

单目标视觉跟踪旨在通过预测物体的边界框在每个视频帧中定位目标。最近的方法采用迭代过程来逐步细化边界框并在图像中定位目标。在这种方法中，深度模型将与当前估计的目标边界框对应的图像块作为输入，并输出与每个可能的边界框细化相关的概率，通常定义为边界框中心和大小的一组离散线性变换。在每次迭代中，仅应用一种变换，并且模型的监督训练可能会通过对某些变换赋予比其他变换更高的重要性优先级而引入内在的模糊性。本文提出了一种选择边界框细化问题的新公式。它引入了无冲突变换的概念，并允许在每次迭代时对目标边界框应用多个细化，而不会在模型参数学习期间引入模糊性。实证结果表明，所提出的方法在跟踪结果的准确性和精度方面改进了迭代单细化方法。