基于照片的自动手工标记语义文本识别

Automated hand-marked semantic text recognition from photographs.

作者信息

Suh Seungah, Lee Ghang, Gil Daeyoung, Kim Yonghan

机构信息

Department of Architecture and Architectural Engineering, Yonsei University, Seoul, 03722, Republic of Korea.

出版信息

Sci Rep. 2023 Aug 30;13(1):14240. doi: 10.1038/s41598-023-41489-4.

DOI:10.1038/s41598-023-41489-4

PMID:37648714

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10469204/

Abstract

Automated text recognition techniques have made significant advancements; however, certain tasks still present challenges. This study is motivated by the need to automatically recognize hand-marked text on construction defect tags among millions of photographs. To address this challenge, we investigated three methods for automating hand-marked semantic text recognition (HMSTR)-a modified scene text recognition-based (STR) approach, a two-step HMSTR approach, and a lumped approach. The STR approach involves locating marked text using an object detection model and recognizing it using a competition-winning STR model. Similarly, the two-step HMSTR approach first localizes the marked text and then recognizes the semantic text using an image classification model. By contrast, the lumped approach performs both localization and identification of marked semantic text in a single step using object detection. Among these approaches, the two-step HMSTR approach achieved the highest F1 score (0.92) for recognizing circled text, followed by the STR approach (0.87) and the lumped approach (0.78). To validate the generalizability of the two-step HMSTR approach, subsequent experiments were conducted using check-marked text, resulting in an F1 score of 0.88. Although the proposed methods have been tested specifically with tags, they can be extended to recognize marked text in reports or books.

摘要

自动文本识别技术已经取得了显著进展；然而，某些任务仍然存在挑战。本研究的动机是需要在数百万张照片中自动识别建筑缺陷标签上的手写文本。为应对这一挑战，我们研究了三种用于自动识别手写语义文本识别（HMSTR）的方法——一种基于场景文本识别（STR）的改进方法、一种两步HMSTR方法和一种集中方法。STR方法包括使用目标检测模型定位标记文本，并使用一个获奖的STR模型进行识别。同样，两步HMSTR方法首先定位标记文本，然后使用图像分类模型识别语义文本。相比之下，集中方法使用目标检测在单个步骤中执行标记语义文本的定位和识别。在这些方法中，两步HMSTR方法在识别圈选文本方面获得了最高的F1分数（0.92），其次是STR方法（0.87）和集中方法（0.78）。为了验证两步HMSTR方法的通用性，随后使用勾选文本进行了实验，F1分数为0.88。尽管所提出的方法已经专门针对标签进行了测试，但它们可以扩展到识别报告或书籍中的标记文本。