School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK.
Sensors (Basel). 2021 Jan 28;21(3):888. doi: 10.3390/s21030888.
Accurate and efficient text detection in natural scenes is a fundamental yet challenging task in computer vision, especially when dealing with arbitrarily-oriented texts. Most contemporary text detection methods are designed to identify horizontal or approximately horizontal text, which cannot satisfy practical detection requirements for various real-world images such as image streams or videos. To address this lacuna, we propose a novel method called Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrarily-oriented texts in natural image scenes. First, a rotated anchor box with angle information is used as the text bounding box over various orientations. Second, features of various scales are extracted from the input image to determine the probability, confidence, and inclined bounding boxes of the text. Finally, Rotational Distance Intersection over Union Non-Maximum Suppression is used to eliminate redundancy and acquire detection results with the highest accuracy. Experiments on benchmark comparison are conducted upon four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and ICDAR2017-MLT. The results indicate that the proposed R-YOLO method significantly outperforms state-of-the-art methods in terms of detection efficiency while maintaining high accuracy; for example, the proposed R-YOLO method achieves an F-measure of 82.3% at 62.5 fps with 720 p resolution on the ICDAR2015 dataset.
准确而高效的自然场景文本检测是计算机视觉中的一项基本而具有挑战性的任务,尤其是在处理任意方向的文本时。大多数现代文本检测方法旨在识别水平或近似水平的文本,这无法满足各种现实世界图像(如图像流或视频)的实际检测要求。为了解决这个问题,我们提出了一种名为旋转 YOLO(R-YOLO)的新方法,这是一种强大的实时卷积神经网络(CNN)模型,可用于检测自然图像场景中的任意方向文本。首先,使用带有角度信息的旋转锚框作为文本边界框,以适应各种方向。其次,从输入图像中提取各种尺度的特征,以确定文本的概率、置信度和倾斜边界框。最后,使用旋转交并比非极大值抑制来消除冗余并获得具有最高精度的检测结果。我们在四个流行的数据集(ICDAR2015、ICDAR2013、MSRA-TD500 和 ICDAR2017-MLT)上进行了基准比较实验。结果表明,与最先进的方法相比,所提出的 R-YOLO 方法在保持高精度的同时显著提高了检测效率;例如,在 ICDAR2015 数据集上,以 720p 分辨率和 62.5 fps 的帧率,所提出的 R-YOLO 方法的 F 值达到了 82.3%。