Suppr超能文献

基于级联卷积神经网络的场景文本检测与分割

Scene Text Detection and Segmentation based on Cascaded Convolution Neural Networks.

作者信息

Tang Youbao, Wu Xiangqian

出版信息

IEEE Trans Image Process. 2017 Mar;26(3):1509-1520. doi: 10.1109/TIP.2017.2656474. Epub 2017 Jan 20.

Abstract

Scene text detection and segmentation are two important and challenging research problems in the field of computer vision. This paper proposes a novel method for scene text detection and segmentation based on cascaded convolution neural networks (CNNs). In this method, a CNN based text-aware candidate text region (CTR) extraction model (named detection network, DNet) is designed and trained using both the edges and the whole regions of text, with which coarse CTRs are detected. A CNN based CTR refinement model (named segmentation network, SNet) is then constructed to precisely segment the coarse CTRs into text to get the refined CTRs. With DNet and SNet, much fewer CTRs are extracted than with traditional approaches while more true text regions are kept. The refined CTRs are finally classified using a CNN based CTR classification model (named classification network, CNet) to get the final text regions. All of these CNN based models are modified from VGGNet-16. Extensive experiments on three benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance and greatly outperforms other scene text detection and segmentation approaches.

摘要

场景文本检测与分割是计算机视觉领域中两个重要且具有挑战性的研究问题。本文提出了一种基于级联卷积神经网络(CNN)的场景文本检测与分割新方法。在该方法中,设计并训练了一种基于CNN的文本感知候选文本区域(CTR)提取模型(称为检测网络,DNet),它利用文本的边缘和整个区域进行训练,通过该模型检测出粗略的CTR。然后构建一个基于CNN的CTR细化模型(称为分割网络,SNet),将粗略的CTR精确分割成文本以获得细化的CTR。借助DNet和SNet,与传统方法相比,提取的CTR数量更少,同时保留了更多真实文本区域。最后,使用基于CNN的CTR分类模型(称为分类网络,CNet)对细化的CTR进行分类,以获得最终的文本区域。所有这些基于CNN的模型均是从VGGNet - 16修改而来。在三个基准数据集上进行的大量实验表明,所提出的方法取得了领先的性能,并且大大优于其他场景文本检测与分割方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验