Suppr超能文献

轻量级变压器图像特征提取网络。

Lightweight transformer image feature extraction network.

作者信息

Zheng Wenfeng, Lu Siyu, Yang Youshuai, Yin Zhengtong, Yin Lirong

机构信息

School of Automation, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.

College of Resource and Environment Engineering, Guizhou University, Guiyang, Guizhou, China.

出版信息

PeerJ Comput Sci. 2024 Jan 31;10:e1755. doi: 10.7717/peerj-cs.1755. eCollection 2024.

Abstract

In recent years, the image feature extraction method based on Transformer has become a research hotspot. However, when using Transformer for image feature extraction, the model's complexity increases quadratically with the number of tokens entered. The quadratic complexity prevents vision transformer-based backbone networks from modelling high-resolution images and is computationally expensive. To address this issue, this study proposes two approaches to speed up Transformer models. Firstly, the self-attention mechanism's quadratic complexity is reduced to linear, enhancing the model's internal processing speed. Next, a parameter-less lightweight pruning method is introduced, which adaptively samples input images to filter out unimportant tokens, effectively reducing irrelevant input. Finally, these two methods are combined to create an efficient attention mechanism. Experimental results demonstrate that the combined methods can reduce the computation of the original Transformer model by 30%-50%, while the efficient attention mechanism achieves an impressive 60%-70% reduction in computation.

摘要

近年来,基于Transformer的图像特征提取方法已成为研究热点。然而,在使用Transformer进行图像特征提取时,模型的复杂度会随着输入令牌数量的增加而呈二次方增长。这种二次方复杂度使得基于视觉Transformer的骨干网络难以对高分辨率图像进行建模,并且计算成本高昂。为了解决这个问题,本研究提出了两种加速Transformer模型的方法。首先,将自注意力机制的二次方复杂度降低到线性,提高模型的内部处理速度。其次,引入一种无参数的轻量级剪枝方法,该方法对输入图像进行自适应采样,以滤除不重要的令牌,有效减少无关输入。最后,将这两种方法结合起来创建一个高效的注意力机制。实验结果表明,组合方法可以将原始Transformer模型的计算量减少30%-50%,而高效注意力机制则实现了令人印象深刻的60%-70%的计算量减少。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9215/11636678/96cbc79b893a/peerj-cs-10-1755-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验