通过约束子模最大化实现基于注视的自我中心视频摘要

Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization.

作者信息

Xut Jia, Mukherjee Lopamudra, Li Yin, Warner Jamieson, Rehg James M, Singht Vikas

机构信息

University of Wisconsin-Madison.

University of Wisconsin-Whitewater.

出版信息

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:2235-2244. doi: 10.1109/CVPR.2015.7298836.

DOI:10.1109/CVPR.2015.7298836

PMID:26973428

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4784707/

Abstract

With the proliferation of wearable cameras, the number of videos of users documenting their personal lives using such devices is rapidly increasing. Since such videos may span hours, there is an important need for mechanisms that represent the information content in a compact form (i.e., shorter videos which are more easily browsable/sharable). Motivated by these applications, this paper focuses on the problem of egocentric video summarization. Such videos are usually continuous with significant camera shake and other quality issues. Because of these reasons, there is growing consensus that direct application of standard video summarization tools to such data yields unsatisfactory performance. In this paper, we demonstrate that using gaze tracking information (such as fixation and saccade) significantly helps the summarization task. It allows meaningful comparison of different image frames and enables deriving personalized summaries (gaze provides a sense of the camera wearer's intent). We formulate a summarization model which captures common-sense properties of a good summary, and show that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization. We evaluate our approach on a new gaze-enabled egocentric video dataset (over 15 hours), which will be a valuable standalone resource.

摘要

随着可穿戴式摄像机的普及，用户使用此类设备记录个人生活的视频数量正在迅速增加。由于此类视频可能长达数小时，因此迫切需要以紧凑形式（即更易于浏览/分享的短视频）呈现信息内容的机制。受这些应用的启发，本文聚焦于第一人称视角视频摘要问题。此类视频通常是连续的，存在明显的镜头抖动和其他质量问题。由于这些原因，越来越多的人达成共识，即直接将标准视频摘要工具应用于此类数据会产生不尽人意的性能。在本文中，我们证明使用注视跟踪信息（如注视点和扫视）能显著助力摘要任务。它允许对不同图像帧进行有意义的比较，并能生成个性化摘要（注视提供了摄像机佩戴者意图的一种感觉）。我们制定了一个摘要模型，该模型捕捉了一个好的摘要的常识属性，并表明它可以作为具有划分拟阵约束的次模函数最大化问题来求解，为组合优化领域丰富的研究工作打开了大门。我们在一个新的支持注视的第一人称视角视频数据集（超过15小时）上评估我们的方法，该数据集将成为一个有价值的独立资源。

相似文献

Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:2235-2244. doi: 10.1109/CVPR.2015.7298836.

Scalable gastroscopic video summarization via similar-inhibition dictionary selection.

Artif Intell Med. 2016 Jan;66:1-13. doi: 10.1016/j.artmed.2015.08.006. Epub 2015 Aug 18.

Generating Personalized Summaries of Day Long Egocentric Videos.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6832-6845. doi: 10.1109/TPAMI.2021.3118077. Epub 2023 May 5.

Human Eye Movements Reveal Video Frame Importance.

Computer (Long Beach Calif). 2019 May;52(5):48-57. doi: 10.1109/MC.2019.2903246. Epub 2019 May 14.

Property-Constrained Dual Learning for Video Summarization.

IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):3989-4000. doi: 10.1109/TNNLS.2019.2951680. Epub 2019 Dec 5.

Deep Attention Network for Egocentric Action Recognition.

IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond.

Int J Comput Vis. 2024;132(3):854-871. doi: 10.1007/s11263-023-01879-7. Epub 2023 Oct 18.

Delving into Egocentric Actions.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:287-295. doi: 10.1109/CVPR.2015.7298625.

Diversity-Aware Multi-Video Summarization.

IEEE Trans Image Process. 2017 Oct;26(10):4712-4724. doi: 10.1109/TIP.2017.2708902. Epub 2017 May 26.

Domain independent redundancy elimination based on flow vectors for static video summarization.

Heliyon. 2019 Nov 1;5(10):e02699. doi: 10.1016/j.heliyon.2019.e02699. eCollection 2019 Oct.

引用本文的文献

Glimpse: A Gaze-Based Measure of Temporal Salience.

Sensors (Basel). 2021 Apr 29;21(9):3099. doi: 10.3390/s21093099.

Human Eye Movements Reveal Video Frame Importance.

Computer (Long Beach Calif). 2019 May;52(5):48-57. doi: 10.1109/MC.2019.2903246. Epub 2019 May 14.

本文引用的文献

Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2013 Jun;2013:1886-1893. doi: 10.1109/cvpr.2013.246. Epub 2013 Oct 3.

Active visual segmentation.

IEEE Trans Pattern Anal Mach Intell. 2012 Apr;34(4):639-53. doi: 10.1109/TPAMI.2011.171.

A hierarchical visual model for video object summarization.

IEEE Trans Pattern Anal Mach Intell. 2010 Dec;32(12):2178-90. doi: 10.1109/TPAMI.2010.31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过约束子模最大化实现基于注视的自我中心视频摘要

Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献