Suppr超能文献

使用视觉Transformer和卷积神经网络的混合深度学习框架对手写数字识别的性能分析

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

作者信息

Agrawal Vanita, Jagtap Jayant, Patil Shruti, Kotecha Ketan

机构信息

Department of Computer Science and Information Technology, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.

NIMS Institute of Computing, Artificial Intelligence and Machine Learning, NIMS University Rajasthan, Jaipur, India.

出版信息

MethodsX. 2024 Jan 5;12:102554. doi: 10.1016/j.mex.2024.102554. eCollection 2024 Jun.

Abstract

Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper.•Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images.•The model's performance improved with the implication of cross-validation and hyper-parameter tuning.•The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.

摘要

数字化催生了对高效手写文档识别系统的需求。一份手写文档包含数字、文本、符号、图表等。数字是手写文档的一个基本元素。准确识别手写数字对于有效沟通和数据分析至关重要。众多研究人员尝试运用现代卷积神经网络(CNN)技术来解决这一问题。即便经过训练,尽管识别准确率很高,但CNN的滤波器权重仍保持不变。结果,该过程无法灵活适应输入的变化。因此,计算机视觉研究人员近来对视觉Transformer(ViT)和多层感知器(MLP)产生了兴趣。CNN的缺点引发了一场混合模型革命,它融合了这两个领域的最佳元素。本文分析了混合卷积ViT模型对手写数字识别能力的影响。此外,实时数据包含噪声、失真和多样的书写风格。因此,本文使用经过清理和未清理的手写数字图像进行评估。将所提方法的准确率与最先进的技术进行比较,结果表明所提模型实现了最高的识别准确率。本文还讨论了识别手写文档其他方面的可能解决方案。

•分析了卷积视觉Transformer对经过清理的和实时手写数字图像的影响。

•通过交叉验证和超参数调整的应用,模型的性能得到了提升。

•结果表明,所提模型在经过清理和未清理的手写数字上具有鲁棒性、可行性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8d4/10825681/d26d6f2332c8/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验