MABAN：用于自然语言时刻检索的多代理边界感知网络。

MABAN: Multi-Agent Boundary-Aware Network for Natural Language Moment Retrieval.

出版信息

IEEE Trans Image Process. 2021;30:5589-5599. doi: 10.1109/TIP.2021.3086591. Epub 2021 Jun 16.

DOI:10.1109/TIP.2021.3086591

Abstract

The amount of videos over the Internet and electronic surveillant cameras is growing dramatically, meanwhile paired sentence descriptions are significant clues to select attentional contents from videos. The task of natural language moment retrieval (NLMR) has drawn great interests from both academia and industry, which aims to associate specific video moments with the text descriptions figuring complex scenarios and multiple activities. In general, NLMR requires temporal context to be properly comprehended, and the existing studies suffer from two problems: (1) limited moment selection and (2) insufficient comprehension of structural context. To address these issues, a multi-agent boundary-aware network (MABAN) is proposed in this work. To guarantee flexible and goal-oriented moment selection, MABAN utilizes multi-agent reinforcement learning to decompose NLMR into localizing the two temporal boundary points for each moment. Specially, MABAN employs a two-phase cross-modal interaction to exploit the rich contextual semantic information. Moreover, temporal distance regression is considered to deduce the temporal boundaries, with which the agents can enhance the comprehension of structural context. Extensive experiments are carried out on two challenging benchmark datasets of ActivityNet Captions and Charades-STA, which demonstrate the effectiveness of the proposed approach as compared to state-of-the-art methods. The project page can be found in https://mic.tongji.edu.cn/e5/23/c9778a189731/page.htm.

摘要

互联网和电子监控摄像机的视频数量正在迅速增长，而配对的句子描述是从视频中选择注意力内容的重要线索。自然语言时刻检索（NLMR）的任务引起了学术界和工业界的极大兴趣，它旨在将特定的视频时刻与描述复杂场景和多种活动的文本描述联系起来。一般来说，NLMR 需要适当理解时间上下文，而现有研究存在两个问题：（1）时刻选择有限，（2）结构上下文理解不足。针对这些问题，本文提出了一种多代理边界感知网络（MABAN）。为了保证灵活的、有针对性的时刻选择，MABAN 使用多代理强化学习将 NLMR 分解为为每个时刻定位两个时间边界点。具体来说，MABAN 采用两阶段跨模态交互来利用丰富的上下文语义信息。此外，还考虑了时间距离回归来推断时间边界，代理可以通过时间距离回归增强对结构上下文的理解。在 ActivityNet Captions 和 Charades-STA 两个具有挑战性的基准数据集上进行了广泛的实验，与最先进的方法相比，实验结果证明了所提出方法的有效性。项目主页可以在 https://mic.tongji.edu.cn/e5/23/c9778a189731/page.htm 找到。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

MABAN：用于自然语言时刻检索的多代理边界感知网络。

MABAN: Multi-Agent Boundary-Aware Network for Natural Language Moment Retrieval.

出版信息

相似文献

引用本文的文献

MABAN：用于自然语言时刻检索的多代理边界感知网络。

MABAN: Multi-Agent Boundary-Aware Network for Natural Language Moment Retrieval.

出版信息

相似文献

引用本文的文献