• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于弱监督时间句子定位的局部对应网络

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding.

作者信息

Yang Wenfei, Zhang Tianzhu, Zhang Yongdong, Wu Feng

出版信息

IEEE Trans Image Process. 2021;30:3252-3262. doi: 10.1109/TIP.2021.3058614. Epub 2021 Mar 2.

DOI:10.1109/TIP.2021.3058614
PMID:33596176
Abstract

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.

摘要

在实际应用场景中,弱监督时间句子定位比全监督方法具有更好的可扩展性和实用性。然而,现有的大多数方法都不能很好地对细粒度的视频-文本局部对应关系进行建模,并且没有用于对应关系学习的有效监督信息,因此性能不尽人意。为了解决上述问题,我们提出了一种用于弱监督时间句子定位的端到端局部对应网络(LCNet)。所提出的LCNet具有几个优点。首先,我们以分层方式表示视频和文本特征,以对细粒度的视频-文本对应关系进行建模。其次,我们设计了一种自监督循环一致损失作为视频和文本匹配的学习指导。据我们所知,这是第一项通过使用自监督学习来充分探索视频和文本之间的细粒度对应关系以进行时间句子定位的工作。在两个基准数据集上的大量实验结果表明,所提出的LCNet明显优于现有的弱监督方法。

相似文献

1
Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding.用于弱监督时间句子定位的局部对应网络
IEEE Trans Image Process. 2021;30:3252-3262. doi: 10.1109/TIP.2021.3058614. Epub 2021 Mar 2.
2
Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection.
IEEE Trans Image Process. 2021;30:5848-5861. doi: 10.1109/TIP.2021.3089361. Epub 2021 Jun 24.
3
Weakly-Supervised Video Object Grounding via Causal Intervention.通过因果干预实现弱监督视频对象定位
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3933-3948. doi: 10.1109/TPAMI.2022.3180025. Epub 2023 Feb 3.
4
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.用于视频中时间性句子定位的语义条件动态调制
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2725-2741. doi: 10.1109/TPAMI.2020.3038993. Epub 2022 Apr 1.
5
Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations.具有个体和上下文表示的循环一致弱监督视觉定位
IEEE Trans Image Process. 2023;32:5167-5180. doi: 10.1109/TIP.2023.3311917. Epub 2023 Sep 15.
6
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.基于知识蒸馏的潜在空间语义监督用于跨模态检索
IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.
7
Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.用于弱监督时间语言定位的面向事件的状态对齐网络。
Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.
8
Entity-Enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.基于实体增强的自适应重构网络用于弱监督的指物定位
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3003-3018. doi: 10.1109/TPAMI.2022.3186410. Epub 2023 Feb 3.
9
Single-Frame Supervision for Spatio-Temporal Video Grounding.用于时空视频定位的单帧监督
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5177-5191. doi: 10.1109/TPAMI.2024.3415087.
10
Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.用于弱监督视频异常检测的多模态和多尺度特征融合
Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.

引用本文的文献

1
Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding.用于传感器获取视频的时间定位的文本引导视觉表示优化
Sensors (Basel). 2025 Jul 30;25(15):4704. doi: 10.3390/s25154704.
2
Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.用于弱监督时间语言定位的面向事件的状态对齐网络。
Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.