Barreiro-Garrido Alvaro, Ruiz-Parrado Victoria, Moreno A Belen, Velez Jose F
Higher Technical School of Computer Engineering, Universidad Rey Juan Carlos, c/Tulipan sn, Mostoles, 28922 Madrid, Spain.
Sensors (Basel). 2024 Jun 16;24(12):3892. doi: 10.3390/s24123892.
In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture.
在离线手写文本识别领域,多年来已经开发了许多归一化算法,以便在将自动识别模型应用于手写文本扫描图像之前作为预处理步骤。这些算法在提高识别架构的整体性能方面已证明是有效的。然而,这些方法中的许多都严重依赖启发式策略,而这些策略并未与识别架构本身无缝集成。本文介绍了使用Pix2Pix可训练模型(一种特定类型的条件生成对抗网络)作为归一化手写文本图像的方法。此外,该算法可以无缝集成到为手写识别任务设计的任何深度学习架构的初始阶段。所有这些都有助于将归一化和识别组件作为一个统一的整体进行训练,同时仍保持每个模块的一定可解释性。我们提出的归一化方法从应用于文本图像的启发式变换的混合中学习,旨在减轻不同作者之间个人手写变化的影响。结果,它实现了倾斜和斜度归一化,以及其他传统的预处理目标,例如归一化文本上伸部和下伸部的大小。我们将证明,所提出的架构在两个指标上复制了(并且在某些情况下超过了)一种广泛使用的启发式算法的结果,并且当作为深度识别架构的第一步集成时也是如此。