Lins Rafael Dueire, Bernardino Rodrigo, Barboza Ricardo da Silva, De Oliveira Raimundo Correa
Centro de Informática, Universidade Federal de Pernambuco, Recife 50670-901, PE, Brazil.
Departamento de Computação, Universidade Federal Rural de Pernambuco, Recife 55815-060, PE, Brazil.
J Imaging. 2022 Oct 5;8(10):272. doi: 10.3390/jimaging8100272.
The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance their image. Image binarization is the process of producing a monochromatic image having its color version as input. It is a key step in the document processing pipeline. The recent Quality-Time Binarization Competitions for documents have shown that no binarization algorithm is good for any kind of document image. This paper uses a sample of the texture of the scanned historical documents as the main document feature to select which of the 63 widely used algorithms, using five different versions of the input images, totaling 315 document image-binarization schemes, provides a reasonable quality-time trade-off.
文档的内在特征,如纸张颜色、质地、老化程度、半透明度、印刷、打字或手写的种类等,对于如何处理和增强其图像非常重要。图像二值化是将彩色版本作为输入来生成单色图像的过程。它是文档处理流程中的关键步骤。最近针对文档的质量-时间二值化竞赛表明,没有一种二值化算法适用于任何类型的文档图像。本文使用扫描的历史文档纹理样本作为主要文档特征,从63种广泛使用的算法中选择,使用五种不同版本的输入图像,总共315种文档图像二值化方案,以提供合理的质量-时间权衡。