用于视频摘要的具有L₂ₚ范数的图卷积字典选择

Graph Convolutional Dictionary Selection With L₂ₚ Norm for Video Summarization.

作者信息

Ma Mingyang, Mei Shaohui, Wan Shuai, Wang Zhiyong, Hua Xian-Sheng, Feng David Dagan

出版信息

IEEE Trans Image Process. 2022;31:1789-1804. doi: 10.1109/TIP.2022.3146012. Epub 2022 Feb 10.

DOI:10.1109/TIP.2022.3146012

Abstract

Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L ( ) norm (GCDS ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L ( ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.

摘要

视频摘要（VS）已成为快速理解大量视频数据的最有效解决方案之一。通过将VS问题表述为视频帧上的稀疏选择任务，具有自表示和稀疏正则化的字典选择已证明其在VS方面的前景。然而，现有的字典选择模型通常仅为数据重建而设计，这导致忽略了视频帧之间固有的结构化信息。此外，通常由L范数约束的稀疏性不够强，这导致关键帧冗余，即选择了相似的关键帧。因此，为了解决这两个问题，在本文中，我们提出了一个名为带L（）范数的图卷积字典选择（GCDS ）的通用框架，用于关键帧选择和基于浏览的摘要。首先，我们将图嵌入纳入字典选择以生成图嵌入字典，其可以考虑视频中描绘的结构化信息。其次，我们建议使用L（）范数约束的行稀疏性，其中p可以针对两种形式的视频摘要灵活设置。对于关键帧选择，可以利用来选择多样且有代表性的关键帧；对于浏览，p = 1可以用于选择关键镜头。此外，设计了一种高效的迭代算法来优化所提出的模型，并从理论上证明了其收敛性。在四个基准数据集上进行的包括关键帧选择和基于浏览的摘要的实验结果证明了所提方法的有效性和优越性。

相似文献

Graph Convolutional Dictionary Selection With L₂ₚ Norm for Video Summarization.用于视频摘要的具有L₂ₚ范数的图卷积字典选择

IEEE Trans Image Process. 2022;31:1789-1804. doi: 10.1109/TIP.2022.3146012. Epub 2022 Feb 10.

Keyframe Extraction From Laparoscopic Videos via Diverse and Weighted Dictionary Selection.基于多样且加权词典选择的腹腔镜视频关键帧提取。

IEEE J Biomed Health Inform. 2021 May;25(5):1686-1698. doi: 10.1109/JBHI.2020.3019198. Epub 2021 May 11.

Adaptive Greedy Dictionary Selection for Web Media Summarization.自适应贪婪字典选择在网络媒体摘要中的应用

IEEE Trans Image Process. 2017 Jan;26(1):185-195. doi: 10.1109/TIP.2016.2619260. Epub 2016 Oct 19.

Video Summarization Via Multiview Representative Selection.基于多视图代表性选择的视频摘要。

IEEE Trans Image Process. 2018 May;27(5):2134-2145. doi: 10.1109/TIP.2017.2789332.

Scalable gastroscopic video summarization via similar-inhibition dictionary selection.通过相似抑制字典选择实现可扩展的胃镜视频摘要

Artif Intell Med. 2016 Jan;66:1-13. doi: 10.1016/j.artmed.2015.08.006. Epub 2015 Aug 18.

Video Summarization for Sign Languages Using the Median of Entropy of Mean Frames Method.使用平均帧熵中位数方法的手语视频摘要

Entropy (Basel). 2018 Sep 29;20(10):748. doi: 10.3390/e20100748.

Unsupervised Video Summarization Based on Deep Reinforcement Learning with Interpolation.基于深度强化学习与插值的无监督视频摘要。

Sensors (Basel). 2023 Mar 23;23(7):3384. doi: 10.3390/s23073384.

Double-Structured Sparsity Guided Flexible Embedding Learning for Unsupervised Feature Selection.用于无监督特征选择的双结构稀疏性引导的灵活嵌入学习

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13354-13367. doi: 10.1109/TNNLS.2023.3267184. Epub 2024 Oct 7.

Unsupervised Feature Selection With Constrained ℓ₂,₀-Norm and Optimized Graph.基于约束ℓ₂,₀范数和优化图的无监督特征选择

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1702-1713. doi: 10.1109/TNNLS.2020.3043362. Epub 2022 Apr 4.

Reconstructive Sequence-Graph Network for Video Summarization.用于视频摘要的重构序列图网络。

IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2793-2801. doi: 10.1109/TPAMI.2021.3072117. Epub 2022 Apr 1.

用于视频摘要的具有L₂ₚ范数的图卷积字典选择

Graph Convolutional Dictionary Selection With L₂ₚ Norm for Video Summarization.

作者信息

Ma Mingyang, Mei Shaohui, Wan Shuai, Wang Zhiyong, Hua Xian-Sheng, Feng David Dagan

出版信息

IEEE Trans Image Process. 2022;31:1789-1804. doi: 10.1109/TIP.2022.3146012. Epub 2022 Feb 10.

DOI:10.1109/TIP.2022.3146012

PMID:35100116

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于视频摘要的具有L₂ₚ范数的图卷积字典选择

Graph Convolutional Dictionary Selection With L₂ₚ Norm for Video Summarization.

作者信息

出版信息

相似文献

用于视频摘要的具有L₂ₚ范数的图卷积字典选择

Graph Convolutional Dictionary Selection With L₂ₚ Norm for Video Summarization.

作者信息

出版信息

相似文献