符合叙事特征的多模态立体电影摘要

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

作者信息

Mademlis Ioannis, Tefas Anastasios, Nikolaidis Nikos, Pitas Ioannis

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5828-5840. doi: 10.1109/TIP.2016.2615289. Epub 2016 Oct 5.

DOI:10.1109/TIP.2016.2615289

Abstract

Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue.

摘要

由于海量视频数据的可用性不断提高，视频摘要成为一个具有广泛商业价值且发展迅速的研究领域。相关算法面临着在摘要紧凑性、观赏性和内容覆盖范围之间实现精细平衡的挑战。在过去几年中，立体3D电影的具体情况变得更加重要，但尚未得到相应的研究关注。本文提出了一种针对此类立体电影的多阶段、多模态摘要过程，该过程能够从3D电影中提取符合叙事特征的简短、具有代表性的视频梗概。在初始阶段，引入了一种新颖的低级视频帧描述方法（帧矩描述符），该方法能够在全局和局部尺度上紧凑地捕捉来自亮度、颜色、光流和立体视差视频数据的信息图像统计量。因此，场景纹理、光照、运动和几何属性可以简洁地包含在单个帧特征描述符中，随后可将其用作任何关键帧提取方案（例如，用于镜头内帧聚类）的构建块。然后，将计算出的关键帧用于构建视频梗概形式的电影摘要，并以同时考虑音频模态的方式对其进行后处理。所提出的摘要管道的下一阶段本质上执行镜头修剪，由用户提供的镜头保留参数控制，该参数基于电影角色在视觉和音频模态中的叙事突出性从梗概中删除片段。这个新颖的过程（多模态镜头修剪）被代数建模为一个多模态矩阵列子集选择问题，使用进化计算方法求解。随后，通过对视频梗概的处理来处理摘要引起的定向编辑效果。在最后一步，对梗概进行适当的后处理，以减少可能导致视觉疲劳的立体视频缺陷。

相似文献

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.符合叙事特征的多模态立体电影摘要

IEEE Trans Image Process. 2016 Dec;25(12):5828-5840. doi: 10.1109/TIP.2016.2615289. Epub 2016 Oct 5.

Feature fusion and clustering for key frame extraction.特征融合与聚类用于关键帧提取。

Math Biosci Eng. 2021 Oct 27;18(6):9294-9311. doi: 10.3934/mbe.2021457.

Automatic summarization of soccer highlights using audio-visual descriptors.使用视听描述符自动总结足球精彩片段。

Springerplus. 2015 Jun 30;4:301. doi: 10.1186/s40064-015-1065-9. eCollection 2015.

Scalable gastroscopic video summarization via similar-inhibition dictionary selection.通过相似抑制字典选择实现可扩展的胃镜视频摘要

Artif Intell Med. 2016 Jan;66:1-13. doi: 10.1016/j.artmed.2015.08.006. Epub 2015 Aug 18.

Video summarization using line segments, angles and conic parts.使用线段、角度和圆锥曲线部分的视频摘要。

PLoS One. 2017 Nov 9;12(11):e0181636. doi: 10.1371/journal.pone.0181636. eCollection 2017.

News Video Summarization Combining SURF and Color Histogram Features.结合加速鲁棒特征和颜色直方图特征的新闻视频摘要

Entropy (Basel). 2021 Jul 30;23(8):982. doi: 10.3390/e23080982.

Adaptive fusion of human visual sensitive features for surveillance video summarization.用于监控视频摘要的人类视觉敏感特征自适应融合

J Opt Soc Am A Opt Image Sci Vis. 2017 May 1;34(5):814-826. doi: 10.1364/JOSAA.34.000814.

Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.立体视频的视觉注意建模：基准与计算模型。

IEEE Trans Image Process. 2017 Oct;26(10):4684-4696. doi: 10.1109/TIP.2017.2721112. Epub 2017 Jun 28.

Reconstructive Sequence-Graph Network for Video Summarization.用于视频摘要的重构序列图网络。

IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2793-2801. doi: 10.1109/TPAMI.2021.3072117. Epub 2022 Apr 1.

AudioVisual Video Summarization.视听视频摘要

IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5181-5188. doi: 10.1109/TNNLS.2021.3119969. Epub 2023 Aug 4.

符合叙事特征的多模态立体电影摘要

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

作者信息

Mademlis Ioannis, Tefas Anastasios, Nikolaidis Nikos, Pitas Ioannis

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5828-5840. doi: 10.1109/TIP.2016.2615289. Epub 2016 Oct 5.

DOI:10.1109/TIP.2016.2615289

PMID:28113502

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

符合叙事特征的多模态立体电影摘要

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

作者信息

出版信息

相似文献

符合叙事特征的多模态立体电影摘要

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

作者信息

出版信息

相似文献