• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索用于高性能RGB-T跟踪的多模态时空上下文

Exploring Multi-Modal Spatial-Temporal Contexts for High-Performance RGB-T Tracking.

作者信息

Zhang Tianlu, Jiao Qiang, Zhang Qiang, Han Jungong

出版信息

IEEE Trans Image Process. 2024;33:4303-4318. doi: 10.1109/TIP.2024.3428316. Epub 2024 Jul 30.

DOI:10.1109/TIP.2024.3428316
PMID:39028600
Abstract

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information. Specifically, a Multi-modal Transformer Encoder (MMTE) is designed to achieve the encoding of reliable multi-modal spatial contexts as well as the fusion of multi-modal features. Furthermore, a Quality-aware Transformer Decoder (QATD) is proposed to effectively propagate the tracking cues from historical frames to the current frame, which facilitates the object searching process. Moreover, the proposed MMSTC network can be easily extended to various tracking frameworks. New state-of-the-art results on five prevalent RGB-T tracking benchmarks demonstrate the superiorities of our proposed trackers over existing ones.

摘要

在RGB-T跟踪中,多模态数据内目标与背景之间存在丰富的空间关系,以及连续帧之间空间关系的一致性,这对于提升跟踪性能至关重要。然而,大多数现有的RGB-T跟踪器忽略了RGB-T视频中的这种多模态空间关系和时间一致性,阻碍了它们在复杂场景中的鲁棒跟踪和实际应用。在本文中,我们提出了一种用于RGB-T跟踪的新型多模态时空上下文(MMSTC)网络,该网络采用Transformer架构来构建可靠的多模态空间上下文信息并有效传播时间上下文信息。具体而言,设计了一种多模态Transformer编码器(MMTE)来实现可靠的多模态空间上下文的编码以及多模态特征的融合。此外,还提出了一种质量感知Transformer解码器(QATD),以有效地将跟踪线索从历史帧传播到当前帧,这有助于目标搜索过程。此外,所提出的MMSTC网络可以轻松扩展到各种跟踪框架。在五个流行的RGB-T跟踪基准上取得的新的最优结果证明了我们提出的跟踪器相对于现有跟踪器的优越性。

相似文献

1
Exploring Multi-Modal Spatial-Temporal Contexts for High-Performance RGB-T Tracking.探索用于高性能RGB-T跟踪的多模态时空上下文
IEEE Trans Image Process. 2024;33:4303-4318. doi: 10.1109/TIP.2024.3428316. Epub 2024 Jul 30.
2
AMST: aggregated multi-level spatial and temporal context-based transformer for robust aerial tracking.基于聚合多层次时空上下文的Transformer 模型用于稳健的空中目标跟踪
Sci Rep. 2023 Jun 4;13(1):9062. doi: 10.1038/s41598-023-36131-2.
3
Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning.基于模态感知注意力网络和竞争学习的 RGB-T 视频目标跟踪
Sensors (Basel). 2020 Jan 10;20(2):393. doi: 10.3390/s20020393.
4
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection.CAVER:用于双模态显著目标检测的跨模态视图混合变换器
IEEE Trans Image Process. 2023;32:892-904. doi: 10.1109/TIP.2023.3234702. Epub 2023 Jan 23.
5
RGB-T Tracking With Template-Bridged Search Interaction and Target-Preserved Template Updating.基于模板桥接搜索交互和目标保留模板更新的RGB-T跟踪
IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):634-649. doi: 10.1109/TPAMI.2024.3475472. Epub 2024 Dec 4.
6
Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection.用于轻量级RGB-D显著目标检测的中级特征融合
IEEE Trans Image Process. 2022;31:6621-6634. doi: 10.1109/TIP.2022.3214092. Epub 2022 Oct 26.
7
QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking.QueryTrack:用于RGBT跟踪的联合模态查询融合网络
IEEE Trans Image Process. 2024;33:3187-3199. doi: 10.1109/TIP.2024.3393298. Epub 2024 May 6.
8
Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset.通过模态感知融合网络和大规模数据集实现跨模态目标跟踪
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6981-6994. doi: 10.1109/TNNLS.2024.3406189. Epub 2025 Apr 8.
9
Channel Exchanging for RGB-T Tracking.通道交换的 RGB-T 跟踪。
Sensors (Basel). 2021 Aug 28;21(17):5800. doi: 10.3390/s21175800.
10
3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他应用的3D卷积神经网络
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.