Suppr
超能文献

PF-ViT：用于离线手写汉字识别的并行快速视觉Transformer。

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition.

机构信息

School of Electronic Information, Zhongyuan University of Technology, Zhengzhou 450007, Henan, China.

出版信息

Comput Intell Neurosci. 2022 Sep 28;2022:8255763. doi: 10.1155/2022/8255763. eCollection 2022.

DOI:10.1155/2022/8255763

PMID:36211021

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9534625/

Abstract

Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed.

摘要

最近，视觉转换器 (ViT) 在图像识别领域得到了广泛应用。然而，ViT 模型重复堆叠 12 层编码器，导致模型计算量大、参数多、训练速度慢，难以在移动设备上部署。为了降低模型的计算复杂度并提高训练速度，提出了一种用于离线手写汉字识别的并行快速视觉转换器方法。该方法在 Vision Transformer 模型的结构中添加了编码器模块的并行分支。并行模式包括双向并行、四路并行和七路并行。原始图像经过扁平化和线性嵌入处理操作后输入到编码器模块。编码器的核心步骤是多头注意力机制。多头自注意力可以学习图像序列块之间的相互依赖性。此外，还采用了数据扩充策略来增加数据的多样性。在双向并行实验中，当模型在数据集上的准确率达到 98.1%时，模型的参数和 FLOPs 分别为 4311 万和 4.32G。与参数和 FLOPs 分别为 8600 万和 16.8G 的 ViT 模型相比，双向并行模型的参数减少了 50.1%，FLOPs 减少了 34.6%。该方法已被证明可以有效降低模型的计算复杂度，从而间接提高图像识别速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4940/9534625/76d5e1d6a920/CIN2022-8255763.001.jpg

相似文献

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition.

Comput Intell Neurosci. 2022 Sep 28;2022:8255763. doi: 10.1155/2022/8255763. eCollection 2022.

Gait-ViT: Gait Recognition with Vision Transformer.

Sensors (Basel). 2022 Sep 28;22(19):7362. doi: 10.3390/s22197362.

S-Swin Transformer: simplified Swin Transformer model for offline handwritten Chinese character recognition.

PeerJ Comput Sci. 2022 Sep 20;8:e1093. doi: 10.7717/peerj-cs.1093. eCollection 2022.

Multi-tailed vision transformer for efficient inference.

Neural Netw. 2024 Jun;174:106235. doi: 10.1016/j.neunet.2024.106235. Epub 2024 Mar 14.

HGR-ViT: Hand Gesture Recognition with Vision Transformer.

Sensors (Basel). 2023 Jun 14;23(12):5555. doi: 10.3390/s23125555.

A vision transformer for emphysema classification using CT images.

Phys Med Biol. 2021 Dec 15;66(24). doi: 10.1088/1361-6560/ac3dc8.

RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.

Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

A human activity recognition method based on Vision Transformer.

Sci Rep. 2024 Jul 3;14(1):15310. doi: 10.1038/s41598-024-65850-3.

Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields.

IEEE Trans Pattern Anal Mach Intell. 2013 Oct;35(10):2413-26. doi: 10.1109/TPAMI.2013.49.

Leveraging ShuffleNet transfer learning to enhance handwritten character recognition.

Gene Expr Patterns. 2022 Sep;45:119263. doi: 10.1016/j.gep.2022.119263. Epub 2022 Jul 16.

本文引用的文献

Vision Transformers for Classification of Breast Ultrasound Images.

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:480-483. doi: 10.1109/EMBC48229.2022.9871809.

A Robust Handwritten Numeral Recognition Using Hybrid Orthogonal Polynomials and Moments.

Sensors (Basel). 2021 Mar 12;21(6):1999. doi: 10.3390/s21061999.

Normalization-cooperated gradient feature extraction for handwritten character recognition.

IEEE Trans Pattern Anal Mach Intell. 2007 Aug;29(8):1465-9. doi: 10.1109/TPAMI.2007.1090.

Online recognition of Chinese characters: the state-of-the-art.

IEEE Trans Pattern Anal Mach Intell. 2004 Feb;26(2):198-213. doi: 10.1109/TPAMI.2004.1262182.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

PF-ViT：用于离线手写汉字识别的并行快速视觉Transformer。

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译