IEEE Trans Image Process. 2015 Nov;24(11):3742-53. doi: 10.1109/TIP.2015.2445572. Epub 2015 Jun 15.
Key frame extraction algorithms consider the problem of selecting a subset of the most informative frames from a video to summarize its content. Several applications, such as video summarization, search, indexing, and prints from video, can benefit from extracted key frames of the video under consideration. Most approaches in this class of algorithms work directly with the input video data set, without considering the underlying low-rank structure of the data set. Other algorithms exploit the low-rank component only, ignoring the other key information in the video. In this paper, a novel key frame extraction framework based on robust principal component analysis (RPCA) is proposed. Furthermore, we target the challenging application of extracting key frames from unstructured consumer videos. The proposed framework is motivated by the observation that the RPCA decomposes an input data into: 1) a low-rank component that reveals the systematic information across the elements of the data set and 2) a set of sparse components each of which containing distinct information about each element in the same data set. The two information types are combined into a single l1-norm-based non-convex optimization problem to extract the desired number of key frames. Moreover, we develop a novel iterative algorithm to solve this optimization problem. The proposed RPCA-based framework does not require shot(s) detection, segmentation, or semantic understanding of the underlying video. Finally, experiments are performed on a variety of consumer and other types of videos. A comparison of the results obtained by our method with the ground truth and with related state-of-the-art algorithms clearly illustrates the viability of the proposed RPCA-based framework.
关键帧提取算法考虑的问题是从视频中选择一组最具信息量的帧,以总结其内容。视频摘要、搜索、索引和视频打印等几种应用都可以从所考虑视频的提取关键帧中受益。这类算法中的大多数方法直接处理输入视频数据集,而不考虑数据集的底层低秩结构。其他算法仅利用低秩分量,而忽略视频中的其他关键信息。在本文中,提出了一种基于鲁棒主成分分析(RPCA)的新的关键帧提取框架。此外,我们针对从非结构化消费者视频中提取关键帧的挑战性应用。所提出的框架的动机是观察到 RPCA 将输入数据分解为:1)一个揭示数据集元素之间系统信息的低秩分量;2)一组稀疏分量,每个分量包含关于同一数据集中每个元素的不同信息。这两种信息类型组合成一个基于 l1 范数的非凸优化问题,以提取所需数量的关键帧。此外,我们开发了一种新的迭代算法来解决这个优化问题。所提出的基于 RPCA 的框架不需要镜头检测、分割或对底层视频的语义理解。最后,在各种消费者和其他类型的视频上进行了实验。我们的方法得到的结果与真实情况和相关的最先进算法的比较清楚地说明了所提出的基于 RPCA 的框架的可行性。