Sulaiman Alaa, Omar Khairuddin, Nasrudin Mohammad F
Pattern Recognition Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Selangor, Malaysia.
J Imaging. 2019 Apr 12;5(4):48. doi: 10.3390/jimaging5040048.
In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.
在这个数字化时代,大多数纸质文档正在被转换成数字格式。在转换过程中,大量文档通过电子扫描进行存储和保存。这些文档来源广泛,如古代文献、旧法律记录、医疗报告、乐谱、棕榈叶以及与安全相关问题的报告等。特别是古代和历史文档,由于对比度低和存在损坏的伪像而难以阅读。近年来,退化文档二值化受到了广泛研究,并开发了几种方法来处理文档二值化中的问题和挑战。本文对图像二值化过程中面临的问题和挑战进行了全面综述,随后对用于图像二值化的各种方法进行了深入分析。本文还讨论了用于增强退化文档的先进方法,这些方法在二值化过程中提高了文档质量。进一步讨论了现有方法的有效性和鲁棒性,并且仍有开发一种能更有效处理退化文档二值化的混合方法的空间。