Liu Yuliang, Jin Lianwen, Fang Chuanming
IEEE Trans Image Process. 2019 Nov 26. doi: 10.1109/TIP.2019.2954218.
Scene text in the environment is complicated. It can exist in arbitrary text fonts, sizes or shapes. Although scene text detection has witnessed considerable progress in recent years, the detection of text with complex shapes, especially curved text, remains challenging. Datasets with adequate samples to overcome the problem presented by curved text (or other irregularly shaped text) have been introduced only recently; however, the performance of the reported methods on these datasets is unsatisfactory. Therefore, detecting arbitrarily shaped text remains a challenging. This motivated us to propose the Mask Tightness Text Detector (Mask TTD) to improve text detection performance. Mask TTD uses a tightness prior and text frontier learning to enhance pixel-wise mask prediction. In addition, it achieves mutual promotion by integrating a branch for the polygonal boundary of each text region, which significantly improves the detection performance of arbitrarily shaped text. Experiments demonstrate that Mask TTD can achieve state-ofthe-art performance on existing curved text datasets (CTW1500, Total-text, and CUTE80) and three common benchmark datasets (RCTW-17, MSRA-TD500, and ICDAR 2015). It is worth mentioning that on CTW1500, our method can outperform previous methods, especially at higher intersection over union (IoU) thresholds (16% higher than the next-best method with an IoU threshold of 0.8), which demonstrates its potential for tight text detection. Moreover, on the largest Chinese-based dataset RCTW-17, Mask TTD outperforms other methods by a large margin in terms of both the Average Precision and F-measure, showing its powerful generalization ability.
环境中的场景文本复杂多样。它可以呈现为任意的字体、大小或形状。尽管近年来场景文本检测取得了显著进展,但复杂形状文本(尤其是弯曲文本)的检测仍然具有挑战性。直到最近才引入了具有足够样本的数据集来克服弯曲文本(或其他不规则形状文本)带来的问题;然而,已报道方法在这些数据集上的性能并不理想。因此,检测任意形状的文本仍然是一项具有挑战性的任务。这促使我们提出掩码紧密度文本检测器(Mask TTD)以提高文本检测性能。Mask TTD利用紧密度先验和文本边界学习来增强逐像素掩码预测。此外,它通过整合每个文本区域多边形边界的分支实现相互促进,显著提高了任意形状文本的检测性能。实验表明,Mask TTD在现有的弯曲文本数据集(CTW1500、Total-text和CUTE80)以及三个常见基准数据集(RCTW-17、MSRA-TD500和ICDAR 2015)上能够达到领先的性能。值得一提的是,在CTW1500数据集上,我们的方法优于先前的方法,特别是在更高的交并比(IoU)阈值下(在IoU阈值为0.8时比次优方法高16%),这证明了其在紧密文本检测方面的潜力。此外,在最大的中文数据集RCTW-17上,Mask TTD在平均精度和F值方面均大幅优于其他方法,显示出其强大的泛化能力。