School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
IEEE Trans Image Process. 2011 Mar;20(3):790-9. doi: 10.1109/TIP.2010.2068553. Epub 2010 Aug 19.
Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking.
从视频中检测文本和字幕对于视频检索、注释、索引和内容分析非常重要且需求巨大。在本文中,我们提出了一种基于角点的方法来从视频中检测文本和字幕。这种方法的灵感来源于这样一种观察,即在字符中存在密集且有序的角点存在,特别是在文本和字幕中。我们使用了几个有区别的特征来描述由角点形成的文本区域。这些特征的使用非常灵活,因此可以适应不同的应用。所提出的方法的一个重要优点是语言独立性。此外,基于文本特征,我们进一步开发了一种新颖的算法来检测视频中的移动字幕。在该算法中,通过光流提取的运动特征与文本特征相结合,以检测移动字幕模式。决策树被用来学习分类标准。在大量真实视频镜头上进行的实验证明了我们提出的方法的效率和鲁棒性,以及实际系统。我们的文本和字幕检测系统最近在全球多媒体检索竞赛 Star Challenge 中得到了突出展示,以顶级排名获得了卓越的性能。