DSNet：一种用于视频摘要的灵活检测到摘要网络。

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization.

作者信息

Zhu Wencheng, Lu Jiwen, Li Jiahao, Zhou Jie

出版信息

IEEE Trans Image Process. 2021;30:948-962. doi: 10.1109/TIP.2020.3039886. Epub 2020 Dec 8.

DOI:10.1109/TIP.2020.3039886

Abstract

In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization. Our DSNet contains anchor-based and anchor-free counterparts. The anchor-based method generates temporal interest proposals to determine and localize the representative contents of video sequences, while the anchor-free method eliminates the pre-defined temporal proposals and directly predicts the importance scores and segment locations. Different from existing supervised video summarization methods which formulate video summarization as a regression problem without temporal consistency and integrity constraints, our interest detection framework is the first attempt to leverage temporal consistency via the temporal interest detection formulation. Specifically, in the anchor-based approach, we first provide a dense sampling of temporal interest proposals with multi-scale intervals that accommodate interest variations in length, and then extract their long-range temporal features for interest proposal location regression and importance prediction. Notably, positive and negative segments are both assigned for the correctness and completeness information of the generated summaries. In the anchor-free approach, we alleviate drawbacks of temporal proposals by directly predicting importance scores of video frames and segment locations. Particularly, the interest detection framework can be flexibly plugged into off-the-shelf supervised video summarization methods. We evaluate the anchor-based and anchor-free approaches on the SumMe and TVSum datasets. Experimental results clearly validate the effectiveness of the anchor-based and anchor-free approaches.

摘要

在本文中，我们提出了一种用于有监督视频摘要的检测到摘要网络（DSNet）框架。我们的DSNet包含基于锚点和无锚点的对应方法。基于锚点的方法生成时间兴趣提议，以确定和定位视频序列的代表性内容，而无锚点的方法则消除了预定义的时间提议，并直接预测重要性得分和片段位置。与现有的将视频摘要表述为无时间一致性和完整性约束的回归问题的有监督视频摘要方法不同，我们的兴趣检测框架是首次尝试通过时间兴趣检测公式利用时间一致性。具体而言，在基于锚点的方法中，我们首先提供具有多尺度间隔的时间兴趣提议的密集采样，以适应长度上的兴趣变化，然后提取它们的长程时间特征，用于兴趣提议位置回归和重要性预测。值得注意的是，为生成的摘要的正确性和完整性信息同时分配了正片段和负片段。在无锚点的方法中，我们通过直接预测视频帧的重要性得分和片段位置来减轻时间提议的缺点。特别是，兴趣检测框架可以灵活地插入现成的有监督视频摘要方法中。我们在SumMe和TVSum数据集上评估了基于锚点和无锚点的方法。实验结果清楚地验证了基于锚点和无锚点方法的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

DSNet：一种用于视频摘要的灵活检测到摘要网络。

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization.

作者信息

出版信息

相似文献

引用本文的文献

DSNet：一种用于视频摘要的灵活检测到摘要网络。

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization.

作者信息

出版信息

相似文献

引用本文的文献