IEEE Trans Image Process. 2013 Nov;22(11):4460-72. doi: 10.1109/TIP.2013.2273672.
We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process.
我们提出了一种两阶段框架,用于自动视频文本去除,通过适当的数据检测和去除嵌入的视频文本,并填充其剩余区域。在视频文本检测阶段,通过对笔画宽度变换 (SWT) 生成的连通分量进行无监督聚类,找到每一帧中的文本位置。由于 SWT 需要精确的边缘图,我们开发了一种新颖的边缘检测器,该检测器受益于带波变换揭示的几何特征。然后,分析每一帧的文本对象的运动模式以定位视频文本。去除检测到的视频文本区域,然后通过修复方案恢复视频。所提出的视频修复方法应用带波提取的时空几何流来重建缺失的数据。引入了一种 3D 体积正则化算法,该算法利用带波基来利用各向异性规律,以完成修复任务。该方法不需要额外的过程来满足视觉一致性。实验结果证明了我们提出的视频文本检测方法和视频完成技术的有效性,从而实现了整个自动视频文本去除和恢复过程。