Suppr超能文献

奈斯特罗姆变换器:一种基于奈斯特罗姆方法的自注意力近似算法。

Nyströmformer: A Nystöm-based Algorithm for Approximating Self-Attention.

作者信息

Xiong Yunyang, Zeng Zhanpeng, Chakraborty Rudrasis, Tan Mingxing, Fung Glenn, Li Yin, Singh Vikas

机构信息

University of Wisconsin-Madison.

UC Berkeley.

出版信息

Proc AAAI Conf Artif Intell. 2021;35(16):14138-14148. Epub 2021 May 18.

Abstract

Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its application to longer sequences - a topic being actively studied in the community. To address this limitation, we propose Nyströmformer - a model that exhibits favorable scalability as a function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention with () complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs favorably relative to other efficient self-attention methods. Our code is available at https://github.com/mlpen/Nystromformer.

摘要

Transformer已成为广泛的自然语言处理任务中的强大工具。驱动Transformer取得令人印象深刻性能的一个关键组件是自注意力机制,它对每个特定token上其他token的影响或依赖性进行编码。虽然有好处,但自注意力在输入序列长度上的二次复杂度限制了其在更长序列上的应用——这是该领域正在积极研究的一个课题。为了解决这一限制,我们提出了Nyströmformer——一种随着序列长度增加表现出良好可扩展性的模型。我们的想法基于采用Nyström方法以()复杂度近似标准自注意力。Nyströmformer的可扩展性使其能够应用于包含数千个token的更长序列。我们在GLUE基准测试和IMDB评论的多个下游任务上进行了评估,这些任务具有标准序列长度,并且发现我们的Nyströmformer与标准自注意力相比表现相当,在少数情况下甚至略好。在Long Range Arena(LRA)基准测试的更长序列任务中,Nyströmformer相对于其他高效自注意力方法表现出色。我们的代码可在https://github.com/mlpen/Nystromformer获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/182c/8570649/59a0ae5a8cde/nihms-1729972-f0002.jpg

相似文献

3
Token Selection is a Simple Booster for Vision Transformers.令牌选择是视觉Transformer的一种简单增强方法。
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12738-12746. doi: 10.1109/TPAMI.2022.3208922. Epub 2023 Oct 3.
6
Convolution-Enhanced Evolving Attention Networks.卷积增强进化注意网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8176-8192. doi: 10.1109/TPAMI.2023.3236725. Epub 2023 Jun 5.
7
Multi-tailed vision transformer for efficient inference.多尾视觉转换器,用于高效推理。
Neural Netw. 2024 Jun;174:106235. doi: 10.1016/j.neunet.2024.106235. Epub 2024 Mar 14.
8
P2T: Pyramid Pooling Transformer for Scene Understanding.P2T:用于场景理解的金字塔池化变换器
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12760-12771. doi: 10.1109/TPAMI.2022.3202765. Epub 2023 Oct 3.

引用本文的文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验