基于深度强化学习和长期依赖的视频摘要模型。

A Video Summarization Model Based on Deep Reinforcement Learning with Long-Term Dependency.

机构信息

School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China.

出版信息

Sensors (Basel). 2022 Oct 10;22(19):7689. doi: 10.3390/s22197689.

DOI:10.3390/s22197689

PMID:36236789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9571073/

Abstract

Deep summarization models have succeeded in the video summarization field based on the development of gated recursive unit (GRU) and long and short-term memory (LSTM) technology. However, for some long videos, GRU and LSTM cannot effectively capture long-term dependencies. This paper proposes a deep summarization network with auxiliary summarization losses to address this problem. We introduce an unsupervised auxiliary summarization loss module with LSTM and a swish activation function to capture the long-term dependencies for video summarization, which can be easily integrated with various networks. The proposed model is an unsupervised framework for deep reinforcement learning that does not depend on any labels or user interactions. Additionally, we implement a reward function (R(S)) that jointly considers the consistency, diversity, and representativeness of generated summaries. Furthermore, the proposed model is lightweight and can be successfully deployed on mobile devices and enhance the experience of mobile users and reduce pressure on server operations. We conducted experiments on two benchmark datasets and the results demonstrate that our proposed unsupervised approach can obtain better summaries than existing video summarization methods. Furthermore, the proposed algorithm can generate higher F scores with a nearly 6.3% increase on the SumMe dataset and a 2.2% increase on the TVSum dataset compared to the DR-DSN model.

摘要

基于门控循环单元 (GRU) 和长短时记忆 (LSTM) 技术的发展，深度摘要模型在视频摘要领域取得了成功。然而，对于一些长视频，GRU 和 LSTM 无法有效地捕获长期依赖关系。本文提出了一种带有辅助摘要损失的深度摘要网络来解决这个问题。我们引入了一个带有 LSTM 和 swish 激活函数的无监督辅助摘要损失模块，用于捕获视频摘要的长期依赖关系，该模块可以轻松集成到各种网络中。所提出的模型是一种用于深度强化学习的无监督框架，不依赖于任何标签或用户交互。此外，我们实现了一个奖励函数（R(S)），该函数联合考虑了生成摘要的一致性、多样性和代表性。此外，所提出的模型轻量级，可以成功部署在移动设备上，提升移动用户的体验，减轻服务器操作的压力。我们在两个基准数据集上进行了实验，结果表明，我们提出的无监督方法可以获得比现有视频摘要方法更好的摘要。此外，与 DR-DSN 模型相比，所提出的算法在 SumMe 数据集上的 F 分数提高了近 6.3%，在 TVSum 数据集上提高了 2.2%。