Dunnhofer Matteo, Furnari Antonino, Farinella Giovanni Maria, Micheloni Christian
Machine Learning and Perception Lab, University of Udine, Via delle Scienze 206, 33100 Udine, Italy.
Image Processing Laboratory, University of Catania, Viale A. Doria 6, 95125 Catania, Italy.
Int J Comput Vis. 2023;131(1):259-283. doi: 10.1007/s11263-022-01694-6. Epub 2022 Oct 18.
The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used "off-the-shelf" or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated.
The online version contains supplementary material available at 10.1007/s11263-022-01694-6.
对人与物体交互的理解是第一人称视觉(FPV)的基础。视觉跟踪算法能够跟踪摄像头佩戴者操作的物体,可为有效建模此类交互提供有用信息。在过去几年中,计算机视觉社区显著提高了针对各种目标物体和场景的跟踪算法性能。尽管此前有一些在FPV领域利用跟踪器的尝试,但仍缺乏对当前先进跟踪器性能的系统分析。这一研究空白引发了一个问题,即当前的解决方案能否“现成”使用,还是应该进行更多特定领域的研究。本文旨在回答此类问题。我们首次对FPV中的单目标跟踪进行了系统研究。我们的研究广泛分析了42种算法的性能,包括通用目标跟踪器和特定于FPV的基线跟踪器。分析聚焦于FPV设置的不同方面,引入了新的性能指标,并与特定于FPV的任务相关。通过引入TREK - 150实现了这项研究,这是一个由150个密集标注视频序列组成的新型基准数据集。我们的结果表明,FPV中的目标跟踪给当前视觉跟踪器带来了新挑战。我们突出了导致这种行为的因素,并指出了可能的研究方向。尽管存在困难,但我们证明跟踪器对需要短期目标跟踪的FPV下游任务有益。我们预计,随着新的和特定于FPV的方法被研究,通用目标跟踪将在FPV中更受欢迎。
在线版本包含可在10.1007/s11263 - 022 - 01694 - 6获取的补充材料。