使用视觉Transformer和卷积神经网络的混合深度学习框架对手写数字识别的性能分析

Agrawal Vanita, Jagtap Jayant, Patil Shruti, Kotecha Ketan

Department of Computer Science and Information Technology, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India.

NIMS Institute of Computing, Artificial Intelligence and Machine Learning, NIMS University Rajasthan, Jaipur, India.

MethodsX. 2024 Jan 5;12:102554. doi: 10.1016/j.mex.2024.102554. eCollection 2024 Jun.

Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper.•Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images.•The model's performance improved with the implication of cross-validation and hyper-parameter tuning.•The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.

数字化催生了对高效手写文档识别系统的需求。一份手写文档包含数字、文本、符号、图表等。数字是手写文档的一个基本元素。准确识别手写数字对于有效沟通和数据分析至关重要。众多研究人员尝试运用现代卷积神经网络（CNN）技术来解决这一问题。即便经过训练，尽管识别准确率很高，但CNN的滤波器权重仍保持不变。结果，该过程无法灵活适应输入的变化。因此，计算机视觉研究人员近来对视觉Transformer（ViT）和多层感知器（MLP）产生了兴趣。CNN的缺点引发了一场混合模型革命，它融合了这两个领域的最佳元素。本文分析了混合卷积ViT模型对手写数字识别能力的影响。此外，实时数据包含噪声、失真和多样的书写风格。因此，本文使用经过清理和未清理的手写数字图像进行评估。将所提方法的准确率与最先进的技术进行比较，结果表明所提模型实现了最高的识别准确率。本文还讨论了识别手写文档其他方面的可能解决方案。

•分析了卷积视觉Transformer对经过清理的和实时手写数字图像的影响。

•通过交叉验证和超参数调整的应用，模型的性能得到了提升。

•结果表明，所提模型在经过清理和未清理的手写数字上具有鲁棒性、可行性和有效性。

相似文献

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

MethodsX. 2024 Jan 5;12:102554. doi: 10.1016/j.mex.2024.102554. eCollection 2024 Jun.

Enhancement of handwritten text recognition using AI-based hybrid approach.

MethodsX. 2024 Mar 10;12:102654. doi: 10.1016/j.mex.2024.102654. eCollection 2024 Jun.

Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN).

Sensors (Basel). 2020 Jun 12;20(12):3344. doi: 10.3390/s20123344.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.

Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

Deep learning for mango leaf disease identification: A vision transformer perspective.

Heliyon. 2024 Aug 22;10(17):e36361. doi: 10.1016/j.heliyon.2024.e36361. eCollection 2024 Sep 15.

A novel hybrid face mask detection approach using Transformer and convolutional neural network models.

PeerJ Comput Sci. 2023 Mar 27;9:e1265. doi: 10.7717/peerj-cs.1265. eCollection 2023.

Leveraging ShuffleNet transfer learning to enhance handwritten character recognition.

Gene Expr Patterns. 2022 Sep;45:119263. doi: 10.1016/j.gep.2022.119263. Epub 2022 Jul 16.

Kurdish Handwritten character recognition using deep learning techniques.

Gene Expr Patterns. 2022 Dec;46:119278. doi: 10.1016/j.gep.2022.119278. Epub 2022 Oct 3.

Convolutional ensembles for Arabic Handwritten Character and Digit Recognition.

PeerJ Comput Sci. 2018 Oct 15;4:e167. doi: 10.7717/peerj-cs.167. eCollection 2018.

Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts.

Entropy (Basel). 2021 Mar 13;23(3):340. doi: 10.3390/e23030340.

引用本文的文献

MAN-C: A masked autoencoder neural cryptography based encryption scheme for CT scan images.

MethodsX. 2024 Apr 28;12:102738. doi: 10.1016/j.mex.2024.102738. eCollection 2024 Jun.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

MethodsX. 2024 Jan 5;12:102554. doi: 10.1016/j.mex.2024.102554. eCollection 2024 Jun.

Enhancement of handwritten text recognition using AI-based hybrid approach.

MethodsX. 2024 Mar 10;12:102654. doi: 10.1016/j.mex.2024.102654. eCollection 2024 Jun.

Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN).

Sensors (Basel). 2020 Jun 12;20(12):3344. doi: 10.3390/s20123344.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.

Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

Deep learning for mango leaf disease identification: A vision transformer perspective.

Heliyon. 2024 Aug 22;10(17):e36361. doi: 10.1016/j.heliyon.2024.e36361. eCollection 2024 Sep 15.

A novel hybrid face mask detection approach using Transformer and convolutional neural network models.

PeerJ Comput Sci. 2023 Mar 27;9:e1265. doi: 10.7717/peerj-cs.1265. eCollection 2023.

Leveraging ShuffleNet transfer learning to enhance handwritten character recognition.

Gene Expr Patterns. 2022 Sep;45:119263. doi: 10.1016/j.gep.2022.119263. Epub 2022 Jul 16.

Kurdish Handwritten character recognition using deep learning techniques.

Gene Expr Patterns. 2022 Dec;46:119278. doi: 10.1016/j.gep.2022.119278. Epub 2022 Oct 3.

Convolutional ensembles for Arabic Handwritten Character and Digit Recognition.

PeerJ Comput Sci. 2018 Oct 15;4:e167. doi: 10.7717/peerj-cs.167. eCollection 2018.

Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts.

Entropy (Basel). 2021 Mar 13;23(3):340. doi: 10.3390/e23030340.

引用本文的文献

MAN-C: A masked autoencoder neural cryptography based encryption scheme for CT scan images.

MethodsX. 2024 Apr 28;12:102738. doi: 10.1016/j.mex.2024.102738. eCollection 2024 Jun.

Suppr
超能文献

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

Suppr超能文献

使用视觉Transformer和卷积神经网络的混合深度学习框架对手写数字识别的性能分析

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

Suppr
超能文献