• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

符合叙事特征的多模态立体电影摘要

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

作者信息

Mademlis Ioannis, Tefas Anastasios, Nikolaidis Nikos, Pitas Ioannis

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5828-5840. doi: 10.1109/TIP.2016.2615289. Epub 2016 Oct 5.

DOI:10.1109/TIP.2016.2615289
PMID:28113502
Abstract

Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue.

摘要

由于海量视频数据的可用性不断提高,视频摘要成为一个具有广泛商业价值且发展迅速的研究领域。相关算法面临着在摘要紧凑性、观赏性和内容覆盖范围之间实现精细平衡的挑战。在过去几年中,立体3D电影的具体情况变得更加重要,但尚未得到相应的研究关注。本文提出了一种针对此类立体电影的多阶段、多模态摘要过程,该过程能够从3D电影中提取符合叙事特征的简短、具有代表性的视频梗概。在初始阶段,引入了一种新颖的低级视频帧描述方法(帧矩描述符),该方法能够在全局和局部尺度上紧凑地捕捉来自亮度、颜色、光流和立体视差视频数据的信息图像统计量。因此,场景纹理、光照、运动和几何属性可以简洁地包含在单个帧特征描述符中,随后可将其用作任何关键帧提取方案(例如,用于镜头内帧聚类)的构建块。然后,将计算出的关键帧用于构建视频梗概形式的电影摘要,并以同时考虑音频模态的方式对其进行后处理。所提出的摘要管道的下一阶段本质上执行镜头修剪,由用户提供的镜头保留参数控制,该参数基于电影角色在视觉和音频模态中的叙事突出性从梗概中删除片段。这个新颖的过程(多模态镜头修剪)被代数建模为一个多模态矩阵列子集选择问题,使用进化计算方法求解。随后,通过对视频梗概的处理来处理摘要引起的定向编辑效果。在最后一步,对梗概进行适当的后处理,以减少可能导致视觉疲劳的立体视频缺陷。

相似文献

1
Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.符合叙事特征的多模态立体电影摘要
IEEE Trans Image Process. 2016 Dec;25(12):5828-5840. doi: 10.1109/TIP.2016.2615289. Epub 2016 Oct 5.
2
Feature fusion and clustering for key frame extraction.特征融合与聚类用于关键帧提取。
Math Biosci Eng. 2021 Oct 27;18(6):9294-9311. doi: 10.3934/mbe.2021457.
3
Automatic summarization of soccer highlights using audio-visual descriptors.使用视听描述符自动总结足球精彩片段。
Springerplus. 2015 Jun 30;4:301. doi: 10.1186/s40064-015-1065-9. eCollection 2015.
4
Scalable gastroscopic video summarization via similar-inhibition dictionary selection.通过相似抑制字典选择实现可扩展的胃镜视频摘要
Artif Intell Med. 2016 Jan;66:1-13. doi: 10.1016/j.artmed.2015.08.006. Epub 2015 Aug 18.
5
Video summarization using line segments, angles and conic parts.使用线段、角度和圆锥曲线部分的视频摘要。
PLoS One. 2017 Nov 9;12(11):e0181636. doi: 10.1371/journal.pone.0181636. eCollection 2017.
6
News Video Summarization Combining SURF and Color Histogram Features.结合加速鲁棒特征和颜色直方图特征的新闻视频摘要
Entropy (Basel). 2021 Jul 30;23(8):982. doi: 10.3390/e23080982.
7
Adaptive fusion of human visual sensitive features for surveillance video summarization.用于监控视频摘要的人类视觉敏感特征自适应融合
J Opt Soc Am A Opt Image Sci Vis. 2017 May 1;34(5):814-826. doi: 10.1364/JOSAA.34.000814.
8
Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.立体视频的视觉注意建模:基准与计算模型。
IEEE Trans Image Process. 2017 Oct;26(10):4684-4696. doi: 10.1109/TIP.2017.2721112. Epub 2017 Jun 28.
9
Reconstructive Sequence-Graph Network for Video Summarization.用于视频摘要的重构序列图网络。
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2793-2801. doi: 10.1109/TPAMI.2021.3072117. Epub 2022 Apr 1.
10
AudioVisual Video Summarization.视听视频摘要
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):5181-5188. doi: 10.1109/TNNLS.2021.3119969. Epub 2023 Aug 4.