Suppr超能文献

电子幻灯片与演示视频的稳健时空匹配。

Robust spatiotemporal matching of electronic slides to presentation videos.

机构信息

T J Watson Research Center, IBM, Armonk, NY 10504-1722, USA.

出版信息

IEEE Trans Image Process. 2011 Aug;20(8):2315-28. doi: 10.1109/TIP.2011.2109727. Epub 2011 Jan 31.

Abstract

We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.

摘要

我们描述了一种强大而高效的方法,用于自动匹配和时间对齐电子幻灯片和相应演示的视频。将电子幻灯片与视频匹配为索引、搜索和浏览远程学习应用中的视频提供了新的方法。然而,由于帧组成、幻灯片变形、相机运动、低质量视频捕获和任意幻灯片序列的变化,稳健的自动匹配具有挑战性。我们的全自动方法结合了基于图像的幻灯片到视频帧的匹配以及用于幻灯片更改和相机事件的时间模型。为了解决这些挑战,我们首先从幻灯片和视频帧中提取尺度不变特征变换(SIFT)关键点,并使用随机抽样一致性(RANSAC)根据一致的投影变换(单应性)对它们进行匹配。我们使用初始匹配集构建背景模型和二进制分类器,以将显示幻灯片的视频帧与没有幻灯片的视频帧分开。然后,我们引入了一种新的匹配方案,利用不太独特的 SIFT 关键点,使我们能够处理更困难的图像。最后,我们通过使用估计的匹配概率作为隐马尔可夫模型(HMM)的一部分来改进基于视觉信息的匹配,该模型集成了时间信息和检测到的相机操作。详细的定量实验描述了我们方法的每个部分,并在 13 个演示视频中平均准确率超过 95%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验