IEEE Trans Image Process. 2017 May;26(5):2261-2273. doi: 10.1109/TIP.2017.2678800. Epub 2017 Mar 6.
Video analysis and understanding play a central role in visual intelligence. In this paper, we aim to analyze unconstrained videos, by designing features and approaches to represent and analyze videography styles in the videos. Videography denotes the process of making videos. The unconstrained videos are defined as the long duration consumer videos that usually have diverse editing artifacts and significant complexity of contents. We propose to construct a videography dictionary, which can be utilized to represent every video clip as a sequence of videography words. In addition to semantic features, such as foreground object motion and camera motion, we also incorporate two novel interpretable features to characterize videography, including the scale information and the motion correlations. We then demonstrate that, by using statistical analysis methods, the unique videography signatures extracted from different events can be automatically identified. For real-world applications, we explore the use of videography analysis for three types of applications, including content-based video retrieval, video summarization (both visual and textual), and videography-based feature pooling. In the experiments, we evaluate the performance of our approach and other methods on a large-scale unconstrained video dataset, and show that the proposed approach significantly benefits video analysis in various ways.
视频分析和理解在视觉智能中起着核心作用。在本文中,我们旨在通过设计特征和方法来分析无约束视频,以表示和分析视频中的视频制作风格。视频制作是指制作视频的过程。无约束视频是指时长较长的消费者视频,通常具有多种编辑效果和内容的显著复杂性。我们建议构建一个视频字典,可用于将每个视频片段表示为一系列视频词序列。除了语义特征,如前景对象运动和相机运动,我们还结合了两个新颖的可解释特征来描述视频制作,包括尺度信息和运动相关性。然后,我们证明通过使用统计分析方法,可以自动识别从不同事件中提取的独特视频制作签名。对于实际应用,我们探索了视频分析在三种类型的应用中的用途,包括基于内容的视频检索、视频摘要(视觉和文本)以及基于视频制作的特征池。在实验中,我们评估了我们的方法和其他方法在大规模无约束视频数据集上的性能,并表明所提出的方法在各种方面显著有益于视频分析。