Song Xiaomu, Fan Guoliang
Evanston Northwestern Healthcare Research Institute and Northwestern University, Evanston, IL 60201, USA.
IEEE Trans Image Process. 2007 Dec;16(12):3035-46. doi: 10.1109/tip.2007.908283.
We propose a new statistical generative model for spatiotemporal video segmentation. The objective is to partition a video sequence into homogeneous segments that can be used as "building blocks" for semantic video segmentation. The baseline framework is a Gaussian mixture model (GMM)-based video modeling approach that involves a six-dimensional spatiotemporal feature space. Specifically, we introduce the concept of frame saliency to quantify the relevancy of a video frame to the GMM-based spatiotemporal video modeling. This helps us use a small set of salient frames to facilitate the model training by reducing data redundancy and irrelevance. A modified expectation maximization algorithm is developed for simultaneous GMM training and frame saliency estimation, and the frames with the highest saliency values are extracted to refine the GMM estimation for video segmentation. Moreover, it is interesting to find that frame saliency can imply some object behaviors. This makes the proposed method also applicable to other frame-related video analysis tasks, such as key-frame extraction, video skimming, etc. Experiments on real videos demonstrate the effectiveness and efficiency of the proposed method.
我们提出了一种用于时空视频分割的新统计生成模型。目标是将视频序列划分为同质段,这些段可作为语义视频分割的“构建块”。基线框架是一种基于高斯混合模型(GMM)的视频建模方法,涉及一个六维时空特征空间。具体而言,我们引入了帧显著性的概念,以量化视频帧与基于GMM的时空视频建模的相关性。这有助于我们通过减少数据冗余和无关性,使用一小部分显著帧来促进模型训练。开发了一种改进的期望最大化算法,用于同时进行GMM训练和帧显著性估计,并提取具有最高显著性值的帧以细化用于视频分割的GMM估计。此外,有趣的是发现帧显著性可以暗示一些对象行为。这使得所提出的方法也适用于其他与帧相关的视频分析任务,例如关键帧提取、视频浏览等。对真实视频的实验证明了所提出方法的有效性和效率。