Suppr超能文献

用于多目标跟踪的深度亲和网络

Deep Affinity Network for Multiple Object Tracking.

作者信息

Sun ShiJie, Akhtar Naveed, Song HuanSheng, Mian Ajmal, Shah Mubarak

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):104-119. doi: 10.1109/TPAMI.2019.2929520. Epub 2020 Dec 4.

Abstract

Multiple Object Tracking (MOT) plays an important role in solving many fundamental problems in video analysis and computer vision. Most MOT methods employ two steps: Object Detection and Data Association. The first step detects objects of interest in every frame of a video, and the second establishes correspondence between the detected objects in different frames to obtain their tracks. Object detection has made tremendous progress in the last few years due to deep learning. However, data association for tracking still relies on hand crafted constraints such as appearance, motion, spatial proximity, grouping etc. to compute affinities between the objects in different frames. In this paper, we harness the power of deep learning for data association in tracking by jointly modeling object appearances and their affinities between different frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN) learns compact, yet comprehensive features of pre-detected objects at several levels of abstraction, and performs exhaustive pairing permutations of those features in any two frames to infer object affinities. DAN also accounts for multiple objects appearing and disappearing between video frames. We exploit the resulting efficient affinity computations to associate objects in the current frame deep into the previous frames for reliable on-line tracking. Our technique is evaluated on popular multiple object tracking challenges MOT15, MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics demonstrates that our approach is among the best performing techniques on the leader board for these challenges. The open source implementation of our work is available at https://github.com/shijieS/SST.git.

摘要

多目标跟踪(MOT)在解决视频分析和计算机视觉中的许多基本问题方面发挥着重要作用。大多数MOT方法采用两个步骤:目标检测和数据关联。第一步在视频的每一帧中检测感兴趣的目标,第二步在不同帧中检测到的目标之间建立对应关系以获得它们的轨迹。由于深度学习,目标检测在过去几年中取得了巨大进展。然而,用于跟踪的数据关联仍然依赖于手工制作的约束条件,如外观、运动、空间邻近性、分组等,以计算不同帧中目标之间的亲和力。在本文中,我们通过以端到端的方式联合建模目标外观及其在不同帧之间的亲和力,利用深度学习的力量进行跟踪中的数据关联。所提出的深度亲和力网络(DAN)在几个抽象层次上学习预检测目标的紧凑但全面的特征,并对任意两帧中的这些特征进行详尽的配对排列以推断目标亲和力。DAN还考虑了视频帧之间出现和消失的多个目标。我们利用由此产生的高效亲和力计算,将当前帧中的目标与前面的帧进行深度关联,以实现可靠的在线跟踪。我们的技术在流行的多目标跟踪挑战MOT15、MOT17和UA-DETRAC上进行了评估。在十二个评估指标下的全面基准测试表明,我们的方法在这些挑战的排行榜上是表现最佳的技术之一。我们工作的开源实现可在https://github.com/shijieS/SST.git获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验