基于互信息和熵滑动窗口法的视频摘要

Video Summarization Based on Mutual Information and Entropy Sliding Window Method.

作者信息

Li WenLin, Qi DeYu, Zhang ChangJian, Guo Jing, Yao JiaJun

机构信息

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China.

School of Software Engineering, South China University of Technology, Guangzhou 510006, China.

出版信息

Entropy (Basel). 2020 Nov 12;22(11):1285. doi: 10.3390/e22111285.

DOI:10.3390/e22111285

PMID:33287053

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7711815/

Abstract

This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.

摘要

本文提出了一种名为基于互信息和熵的自适应滑动窗口（MIESW）方法的视频摘要算法，该算法专门用于手势视频的静态摘要。考虑到手势视频通常具有不确定的过渡姿势、不清晰的运动边界或难以解释的帧，我们提出了一种三步方法，第一步是浏览视频，第二步应用MIESW方法选择候选关键帧，第三步去除大多数冗余关键帧。具体来说，第一步是将视频转换为帧序列并调整帧的大小。第二步，执行名为MIESW的关键帧提取算法。帧间互信息值用作度量来自适应调整滑动窗口的大小，以对视频的相似内容进行分组。然后，基于帧的熵值和帧组的平均互信息值，应用阈值方法优化分组，并提取关键帧。第三步，进行加速鲁棒特征（SURF）分析，以消除这些候选关键帧中的冗余帧。从实用性和可行性的角度对精确率、召回率和F度量的计算进行了优化。实验表明，使用我们的方法提取的关键帧提供了高质量的视频摘要，并且基本涵盖了手势视频的主要内容。