Kim Wonjun, Kim Changick
Department of Electronic Engineering, Information and Communications University, Daejeon, Korea.
IEEE Trans Image Process. 2009 Feb;18(2):401-11. doi: 10.1109/TIP.2008.2008225. Epub 2008 Dec 16.
Overlay text brings important semantic clues in video content analysis such as video information retrieval and summarization, since the content of the scene or the editor's intention can be well represented by using inserted text. Most of the previous approaches to extracting overlay text from videos are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to detect and extract the overlay text from the video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background, a transition map is first generated. Then candidate regions are extracted by a reshaping method and the overlay text regions are determined based on the occurrence of overlay text in each candidate. The detected overlay text regions are localized accurately using the projection of overlay text pixels in the transition map and the text extraction is finally conducted. The proposed method is robust to different character size, position, contrast, and color. It is also language independent. Overlay text region update between frames is also employed to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method.
叠加文本在视频内容分析(如视频信息检索和摘要)中提供了重要的语义线索,因为场景内容或编辑意图可以通过插入的文本得到很好的体现。以前大多数从视频中提取叠加文本的方法都是基于低级特征,如图像边缘、颜色和纹理信息。然而,现有方法在处理具有各种对比度或插入复杂背景中的文本时存在困难。在本文中,我们提出了一种新颖的框架来从视频场景中检测和提取叠加文本。基于我们的观察,即插入文本与其相邻背景之间存在过渡颜色,首先生成一个过渡图。然后通过一种重塑方法提取候选区域,并根据每个候选区域中叠加文本的出现情况确定叠加文本区域。利用过渡图中叠加文本像素的投影对检测到的叠加文本区域进行精确定位,最后进行文本提取。所提出的方法对不同的字符大小、位置、对比度和颜色具有鲁棒性,并且与语言无关。还采用了帧间叠加文本区域更新来减少处理时间。在各种视频上进行了实验,以验证所提出方法的有效性。