HiSA：用于视频时间定位的层次语义关联

HiSA: Hierarchically Semantic Associating for Video Temporal Grounding.

作者信息

Xu Zhe, Chen Da, Wei Kun, Deng Cheng, Xue Hui

出版信息

IEEE Trans Image Process. 2022;31:5178-5188. doi: 10.1109/TIP.2022.3191841. Epub 2022 Aug 4.

DOI:10.1109/TIP.2022.3191841

Abstract

Video Temporal Grounding (VTG) aims to locate the time interval in a video that is semantically relevant to a language query. Existing VTG methods interact the query with entangled video features and treat the instances in a dataset independently. However, intra-video entanglement and inter-video connection are rarely considered in these methods, leading to mismatches between the video and language. To this end, we propose a novel method, dubbed Hierarchically Semantic Associating (HiSA), which aims to precisely align the video with language and obtain discriminative representation for further location regression. Specifically, the action factors and background factors are disentangled from adjacent video segments, enforcing precise multimodal interaction and alleviating the intra-video entanglement. In addition, cross-guided contrast is elaborately framed to capture the inter-video connection, which benefits the multimodal understanding to locate the time interval. Extensive experiments on three benchmark datasets demonstrate that our approach significantly outperforms the state-of-the-art methods. The project page is available at: https://github.com/zhexu1997/HiSA.

摘要

视频时间定位（VTG）旨在在视频中定位与语言查询在语义上相关的时间间隔。现有的VTG方法将查询与纠缠的视频特征进行交互，并独立处理数据集中的实例。然而，这些方法很少考虑视频内的纠缠和视频间的联系，导致视频与语言之间的不匹配。为此，我们提出了一种新颖的方法，称为层次语义关联（HiSA），其目的是将视频与语言精确对齐，并获得用于进一步位置回归的判别性表示。具体而言，动作因素和背景因素从相邻视频片段中解缠，加强精确的多模态交互并减轻视频内的纠缠。此外，精心构建交叉引导对比以捕捉视频间的联系，这有利于多模态理解以定位时间间隔。在三个基准数据集上进行的大量实验表明，我们的方法显著优于现有最先进的方法。项目页面可在：https://github.com/zhexu1997/HiSA获取。

相似文献

HiSA: Hierarchically Semantic Associating for Video Temporal Grounding.

IEEE Trans Image Process. 2022;31:5178-5188. doi: 10.1109/TIP.2022.3191841. Epub 2022 Aug 4.

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.

IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2725-2741. doi: 10.1109/TPAMI.2020.3038993. Epub 2022 Apr 1.

Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.

Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.

SDN: Semantic Decoupling Network for Temporal Language Grounding.

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6598-6612. doi: 10.1109/TNNLS.2022.3211850. Epub 2024 May 2.

Towards Visual-Prompt Temporal Answer Grounding in Instructional Video.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8836-8853. doi: 10.1109/TPAMI.2024.3411045. Epub 2024 Nov 6.

Learning Visual Affordance Grounding From Demonstration Videos.

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16857-16871. doi: 10.1109/TNNLS.2023.3298638. Epub 2024 Oct 29.

Query-Adaptive Late Fusion for Hierarchical Fine-Grained Video-Text Retrieval.

IEEE Trans Neural Netw Learn Syst. 2022 Oct 24;PP. doi: 10.1109/TNNLS.2022.3214208.

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos.

IEEE Trans Image Process. 2021;30:8265-8277. doi: 10.1109/TIP.2021.3113791. Epub 2021 Sep 30.

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12601-12617. doi: 10.1109/TPAMI.2023.3274139. Epub 2023 Sep 5.

Single-Frame Supervision for Spatio-Temporal Video Grounding.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5177-5191. doi: 10.1109/TPAMI.2024.3415087.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HiSA：用于视频时间定位的层次语义关联

HiSA: Hierarchically Semantic Associating for Video Temporal Grounding.

作者信息

Xu Zhe, Chen Da, Wei Kun, Deng Cheng, Xue Hui

出版信息

IEEE Trans Image Process. 2022;31:5178-5188. doi: 10.1109/TIP.2022.3191841. Epub 2022 Aug 4.

DOI:10.1109/TIP.2022.3191841

PMID:35914041

Abstract

摘要

HiSA：用于视频时间定位的层次语义关联

HiSA: Hierarchically Semantic Associating for Video Temporal Grounding.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

HiSA：用于视频时间定位的层次语义关联

HiSA: Hierarchically Semantic Associating for Video Temporal Grounding.

作者信息

出版信息

相似文献