Suppr超能文献

用于视频摘要的时空图关系推理

Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization.

作者信息

Zhu Wencheng, Han Yucheng, Lu Jiwen, Zhou Jie

出版信息

IEEE Trans Image Process. 2022;31:3017-3031. doi: 10.1109/TIP.2022.3163855. Epub 2022 Apr 11.

Abstract

In this paper, we propose a dynamic graph modeling approach to learn spatial-temporal representations for video summarization. Most existing video summarization methods extract image-level features with ImageNet pre-trained deep models. Differently, our method exploits object-level and relation-level information to capture spatial-temporal dependencies. Specifically, our method builds spatial graphs on the detected object proposals. Then, we construct a temporal graph by using the aggregated representations of spatial graphs. Afterward, we perform relational reasoning over spatial and temporal graphs with graph convolutional networks and extract spatial-temporal representations for importance score prediction and key shot selection. To eliminate relation clutters caused by densely connected nodes, we further design a self-attention edge pooling module, which disregards meaningless relations of graphs. We conduct extensive experiments on two popular benchmarks, including the SumMe and TVSum datasets. Experimental results demonstrate that the proposed method achieves superior performance against state-of-the-art video summarization methods.

摘要

在本文中,我们提出了一种动态图建模方法,用于学习视频摘要的时空表示。大多数现有的视频摘要方法使用在ImageNet上预训练的深度模型来提取图像级特征。不同的是,我们的方法利用对象级和关系级信息来捕捉时空依赖性。具体来说,我们的方法在检测到的对象提议上构建空间图。然后,我们通过使用空间图的聚合表示来构建时间图。之后,我们使用图卷积网络对空间图和时间图进行关系推理,并提取时空表示用于重要性得分预测和关键镜头选择。为了消除由密集连接节点引起的关系混乱,我们进一步设计了一个自注意力边缘池化模块,该模块忽略图的无意义关系。我们在两个流行的基准上进行了广泛的实验,包括SumMe和TVSum数据集。实验结果表明,所提出的方法相对于现有最先进的视频摘要方法具有卓越的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验