• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MABAN:用于自然语言时刻检索的多代理边界感知网络。

MABAN: Multi-Agent Boundary-Aware Network for Natural Language Moment Retrieval.

出版信息

IEEE Trans Image Process. 2021;30:5589-5599. doi: 10.1109/TIP.2021.3086591. Epub 2021 Jun 16.

DOI:10.1109/TIP.2021.3086591
PMID:34110992
Abstract

The amount of videos over the Internet and electronic surveillant cameras is growing dramatically, meanwhile paired sentence descriptions are significant clues to select attentional contents from videos. The task of natural language moment retrieval (NLMR) has drawn great interests from both academia and industry, which aims to associate specific video moments with the text descriptions figuring complex scenarios and multiple activities. In general, NLMR requires temporal context to be properly comprehended, and the existing studies suffer from two problems: (1) limited moment selection and (2) insufficient comprehension of structural context. To address these issues, a multi-agent boundary-aware network (MABAN) is proposed in this work. To guarantee flexible and goal-oriented moment selection, MABAN utilizes multi-agent reinforcement learning to decompose NLMR into localizing the two temporal boundary points for each moment. Specially, MABAN employs a two-phase cross-modal interaction to exploit the rich contextual semantic information. Moreover, temporal distance regression is considered to deduce the temporal boundaries, with which the agents can enhance the comprehension of structural context. Extensive experiments are carried out on two challenging benchmark datasets of ActivityNet Captions and Charades-STA, which demonstrate the effectiveness of the proposed approach as compared to state-of-the-art methods. The project page can be found in https://mic.tongji.edu.cn/e5/23/c9778a189731/page.htm.

摘要

互联网和电子监控摄像机的视频数量正在迅速增长,而配对的句子描述是从视频中选择注意力内容的重要线索。自然语言时刻检索(NLMR)的任务引起了学术界和工业界的极大兴趣,它旨在将特定的视频时刻与描述复杂场景和多种活动的文本描述联系起来。一般来说,NLMR 需要适当理解时间上下文,而现有研究存在两个问题:(1)时刻选择有限,(2)结构上下文理解不足。针对这些问题,本文提出了一种多代理边界感知网络(MABAN)。为了保证灵活的、有针对性的时刻选择,MABAN 使用多代理强化学习将 NLMR 分解为为每个时刻定位两个时间边界点。具体来说,MABAN 采用两阶段跨模态交互来利用丰富的上下文语义信息。此外,还考虑了时间距离回归来推断时间边界,代理可以通过时间距离回归增强对结构上下文的理解。在 ActivityNet Captions 和 Charades-STA 两个具有挑战性的基准数据集上进行了广泛的实验,与最先进的方法相比,实验结果证明了所提出方法的有效性。项目主页可以在 https://mic.tongji.edu.cn/e5/23/c9778a189731/page.htm 找到。

相似文献

1
MABAN: Multi-Agent Boundary-Aware Network for Natural Language Moment Retrieval.MABAN:用于自然语言时刻检索的多代理边界感知网络。
IEEE Trans Image Process. 2021;30:5589-5599. doi: 10.1109/TIP.2021.3086591. Epub 2021 Jun 16.
2
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos.用于视频中时态语言定位的多模态交互图卷积网络
IEEE Trans Image Process. 2021;30:8265-8277. doi: 10.1109/TIP.2021.3113791. Epub 2021 Sep 30.
3
Text-Based Localization of Moments in a Video Corpus.视频语料库中基于文本的矩定位
IEEE Trans Image Process. 2021;30:8886-8899. doi: 10.1109/TIP.2021.3120038. Epub 2021 Oct 28.
4
Multi-Level Content-Aware Boundary Detection for Temporal Action Proposal Generation.用于生成时间动作建议的多级内容感知边界检测
IEEE Trans Image Process. 2023;32:6090-6101. doi: 10.1109/TIP.2023.3328471. Epub 2023 Nov 8.
5
Multi-Scale 2D Temporal Adjacency Networks for Moment Localization With Natural Language.用于基于自然语言的时刻定位的多尺度二维时间邻接网络
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9073-9087. doi: 10.1109/TPAMI.2021.3120745. Epub 2022 Nov 7.
6
Moment Retrieval via Cross-Modal Interaction Networks with Query Reconstruction.通过具有查询重构的跨模态交互网络进行时刻检索
IEEE Trans Image Process. 2020 Jan 17. doi: 10.1109/TIP.2020.2965987.
7
Interaction-Integrated Network for Natural Language Moment Localization.用于自然语言时刻定位的交互集成网络。
IEEE Trans Image Process. 2021;30:2538-2548. doi: 10.1109/TIP.2021.3052086. Epub 2021 Feb 3.
8
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description.AAP-MIT:用于多句子视频描述的注意多孔金字塔网络和记忆整合转换器。
IEEE Trans Image Process. 2022;31:5559-5569. doi: 10.1109/TIP.2022.3195643. Epub 2022 Aug 26.
9
SDN: Semantic Decoupling Network for Temporal Language Grounding.SDN:用于时态语言定位的语义解耦网络。
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6598-6612. doi: 10.1109/TNNLS.2022.3211850. Epub 2024 May 2.
10
Video Moment Retrieval With Cross-Modal Neural Architecture Search.基于跨模态神经架构搜索的视频瞬间检索
IEEE Trans Image Process. 2022;31:1204-1216. doi: 10.1109/TIP.2022.3140611. Epub 2022 Jan 19.

引用本文的文献

1
Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding.用于弱监督时间语言定位的面向事件的状态对齐网络。
Entropy (Basel). 2024 Aug 27;26(9):730. doi: 10.3390/e26090730.