一种基于深度学习的视觉传感概念，用于在现实世界的恶劣条件下对文档图像进行稳健分类。

A Deep-Learning Based Visual Sensing Concept for a Robust Classification of Document Images under Real-World Hard Conditions.

作者信息

Mohsenzadegan Kabeh, Tavakkoli Vahid, Kyamakya Kyandoghere

机构信息

Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria.

出版信息

Sensors (Basel). 2021 Oct 12;21(20):6763. doi: 10.3390/s21206763.

DOI:10.3390/s21206763

PMID:34695977

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8537789/

Abstract

This paper's core objective is to develop and validate a new neurocomputing model to classify document images in particularly demanding hard conditions such as image distortions, image size variance and scale, a huge number of classes, etc. Document classification is a special machine vision task in which document images are categorized according to their likelihood. Document classification is by itself an important topic for the digital office and it has several usages. Additionally, different methods for solving this problem have been presented in various studies; their respectively reached performance is however not yet good enough. This task is very tough and challenging. Thus, a novel, more accurate and precise model is needed. Although the related works do reach acceptable accuracy values for less hard conditions, they generally fully fail in the face of those above-mentioned hard, real-world conditions, including, amongst others, distortions such as noise, blur, low contrast, and shadows. In this paper, a novel deep CNN model is developed, validated and benchmarked with a selection of the most relevant recent document classification models. Additionally, the model's sensitivity was significantly improved by injecting different artifacts during the training process. In the benchmarking, it does clearly outperform all others by at least 4%, thus reaching more than 96% accuracy.

摘要

本文的核心目标是开发并验证一种新的神经计算模型，用于在诸如图像失真、图像大小差异和比例、大量类别等特别苛刻的困难条件下对文档图像进行分类。文档分类是一项特殊的机器视觉任务，其中文档图像根据其可能性进行分类。文档分类本身就是数字办公领域的一个重要课题，并且有多种用途。此外，各种研究中已经提出了不同的解决该问题的方法；然而，它们各自达到的性能还不够好。这项任务非常艰巨且具有挑战性。因此，需要一种新颖、更准确和精确的模型。尽管相关工作在不太困难的条件下确实达到了可接受的准确率值，但面对上述困难的现实世界条件，包括噪声、模糊、低对比度和阴影等失真情况时，它们通常会完全失败。在本文中，开发了一种新颖的深度卷积神经网络（CNN）模型，并用一些最近最相关的文档分类模型进行了验证和基准测试。此外，通过在训练过程中注入不同的伪像，显著提高了该模型的灵敏度。在基准测试中，它的表现明显优于所有其他模型，至少高出4%，从而达到了超过96%的准确率。