使用深度学习算法的历史文本行分割：Mask-RCNN与U-Net网络对比

Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks.

作者信息

Fizaine Florian Côme, Bard Patrick, Paindavoine Michel, Robin Cécile, Bouyé Edouard, Lefèvre Raphaël, Vinter Annie

机构信息

LEAD-CNRS, Université de Bourgogne, 21000 Dijon, France.

Archives Départementales de Côte d'Or, 21000 Dijon, France.

出版信息

J Imaging. 2024 Mar 5;10(3):65. doi: 10.3390/jimaging10030065.

DOI:10.3390/jimaging10030065

PMID:38535145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10971631/

Abstract

Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance.

摘要

文本行分割是在应用大多数文本转录算法之前的必要预处理步骤。在此背景下使用的主要深度学习网络（ARU-Net、dhSegment和Doc-UFCN）都基于U-Net架构。它们效率很高，但都属于同一概念，需要一个后处理步骤来执行实例（例如文本行）分割。在本研究中，我们测试了旨在直接执行实例分割的Mask-RCNN的优势。这项工作首次在历史文档的文本分割上直接比较了基于Mask-RCNN和基于U-Net的网络，显示出前者优于后者。我们进行了三项研究，一项在不同历史数据库上比较这些网络，另一项在一个私有历史数据库上比较Mask-RCNN和Doc-UFCN，还有一项比较测试网络的手写文本识别（HTR）性能。结果表明，使用相关的行分割指标时，Mask-RCNN的性能优于ARU-Net、dhSegment和Doc-UFCN；性能评估不应只关注网络生成的原始掩码；轻度掩码处理是一种提高评估的有效且简单的解决方案；并且Mask-RCNN能带来更好的HTR性能。