• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

In Defense of Clip-Based Video Relation Detection.

作者信息

Wei Meng, Chen Long, Ji Wei, Yue Xiaoyu, Zimmermann Roger

出版信息

IEEE Trans Image Process. 2024;33:2759-2769. doi: 10.1109/TIP.2024.3379935. Epub 2024 Apr 9.

DOI:10.1109/TIP.2024.3379935
PMID:38530734
Abstract

Video Visual Relation Detection (VidVRD) aims to detect visual relationship triplets in videos using spatial bounding boxes and temporal boundaries. Existing VidVRD methods can be broadly categorized into bottom-up and top-down paradigms, depending on their approach to classifying relations. Bottom-up methods follow a clip-based approach where they classify relations of short clip tubelet pairs and then merge them into long video relations. On the other hand, top-down methods directly classify long video tubelet pairs. While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets. This motivates us to revisit the clip-based paradigm and explore the key success factors in VidVRD. In this paper, we propose a Hierarchical Context Model (HCM) that enriches the object-based spatial context and relation-based temporal context based on clips. We demonstrate that using clip tubelets can achieve superior performance compared to most video-based methods. Additionally, using clip tubelets offers more flexibility in model designs and helps alleviate the limitations associated with video tubelets, such as the challenging long-term object tracking problem and the loss of temporal information in long-term tubelet feature compression. Extensive experiments conducted on two challenging VidVRD benchmarks validate that our HCM achieves a new state-of-the-art performance, highlighting the effectiveness of incorporating advanced spatial and temporal context modeling within the clip-based paradigm.

摘要

相似文献

1
In Defense of Clip-Based Video Relation Detection.
IEEE Trans Image Process. 2024;33:2759-2769. doi: 10.1109/TIP.2024.3379935. Epub 2024 Apr 9.
2
Object Detection in Videos by High Quality Object Linking.通过高质量对象链接实现视频中的目标检测
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1272-1278. doi: 10.1109/TPAMI.2019.2910529. Epub 2019 Apr 11.
3
Video summarization for event-centric videos.以事件为中心的视频的视频摘要
Neural Netw. 2023 Apr;161:359-370. doi: 10.1016/j.neunet.2023.01.047. Epub 2023 Feb 3.
4
A naturalistic viewing paradigm using 360° panoramic video clips and real-time field-of-view changes with eye-gaze tracking.使用 360°全景视频剪辑和实时视野变化并结合眼动追踪的自然观察范式。
Neuroimage. 2020 Aug 1;216:116617. doi: 10.1016/j.neuroimage.2020.116617. Epub 2020 Feb 10.
5
Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation.
IEEE J Biomed Health Inform. 2024 Oct 7;PP. doi: 10.1109/JBHI.2024.3464732.
6
Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation.用于视频场景图生成的轨迹对提议和上下文推理。
Sensors (Basel). 2021 May 2;21(9):3164. doi: 10.3390/s21093164.
7
Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization.用于视频摘要的时空图关系推理
IEEE Trans Image Process. 2022;31:3017-3031. doi: 10.1109/TIP.2022.3163855. Epub 2022 Apr 11.
8
LRTD: long-range temporal dependency based active learning for surgical workflow recognition.基于长程时间依赖的主动学习在手术流程识别中的应用
Int J Comput Assist Radiol Surg. 2020 Sep;15(9):1573-1584. doi: 10.1007/s11548-020-02198-9. Epub 2020 Jun 25.
9
STMixer: A One-Stage Sparse Action Detector.STMixer:一种单阶段稀疏动作检测器。
IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6842-6857. doi: 10.1109/TPAMI.2024.3387127. Epub 2024 Sep 5.
10
Long Short-Term Relation Transformer With Global Gating for Video Captioning.用于视频字幕的带全局门控的长短时关系变换器
IEEE Trans Image Process. 2022;31:2726-2738. doi: 10.1109/TIP.2022.3158546. Epub 2022 Mar 29.