Suppr超能文献

IoUformer:用于视觉跟踪的基于Transformer的伪IoU预测

IoUformer: Pseudo-IoU prediction with transformer for visual tracking.

作者信息

Cai Huayue, Lan Long, Zhang Jing, Zhang Xiang, Zhan Yibing, Luo Zhigang

机构信息

Institute for Quantum & State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, China.

Institute for Quantum & State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, China.

出版信息

Neural Netw. 2024 Feb;170:548-563. doi: 10.1016/j.neunet.2023.10.055. Epub 2023 Nov 15.

Abstract

Siamese tracking has witnessed tremendous progress in tracking paradigm. However, its default box estimation pipeline still faces a crucial inconsistency issue, namely, the bounding box decided by its classification score is not always best overlapped with the ground truth, thus harming performance. To this end, we explore a novel simple tracking paradigm based on the intersection over union (IoU) value prediction. To first bypass this inconsistency issue, we propose a concise target state predictor termed IoUformer, which instead of default box estimation pipeline directly predicts the IoU values related to tracking performance metrics. In detail, it extends the long-range dependency modeling ability of transformer to jointly grasp target-aware interactions between target template and search region, and search sub-region interactions, thus neatly unifying global semantic interaction and target state prediction. Thanks to this joint strength, IoUformer can predict reliable IoU values near-linear with the ground truth, which paves a safe way for our new IoU-based siamese tracking paradigm. Since it is non-trivial to explore this paradigm with pleased efficacy and portability, we offer the respective network components and two alternative localization ways. Experimental results show that our IoUformer-based tracker achieves promising results with less training data. For its applicability, it still serves as a refinement module to consistently boost existing advanced trackers.

摘要

暹罗跟踪在跟踪范式方面取得了巨大进展。然而,其默认的边界框估计流程仍然面临一个关键的不一致问题,即由其分类分数决定的边界框并不总是与真实情况具有最佳重叠,从而损害了性能。为此,我们探索了一种基于交并比(IoU)值预测的新颖简单跟踪范式。为了首先绕过这个不一致问题,我们提出了一种简洁的目标状态预测器,称为IoUformer,它不采用默认的边界框估计流程,而是直接预测与跟踪性能指标相关的IoU值。具体而言,它扩展了Transformer的长程依赖建模能力,以共同把握目标模板与搜索区域之间的目标感知交互以及搜索子区域交互,从而巧妙地统一了全局语义交互和目标状态预测。得益于这种联合优势,IoUformer能够预测与真实情况接近线性的可靠IoU值,这为我们基于IoU的新暹罗跟踪范式铺平了道路。由于以令人满意的效率和可移植性探索这种范式并非易事,我们提供了相应的网络组件和两种替代定位方法。实验结果表明,我们基于IoUformer的跟踪器在较少训练数据的情况下取得了有前景的结果。就其适用性而言,它仍然可以作为一个优化模块来持续提升现有的先进跟踪器。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验