Suppr超能文献

FLAT:融合层表示以实现 NLP 中更有效的迁移学习。

FLAT: Fusing layer representations for more efficient transfer learning in NLP.

机构信息

School of Computer Science and Technology, Nanjing University of Aeronautic and Astronautics, Nanjing, 211106, China.

出版信息

Neural Netw. 2024 Nov;179:106631. doi: 10.1016/j.neunet.2024.106631. Epub 2024 Aug 13.

Abstract

Parameter efficient transfer learning (PETL) methods provide an efficient alternative for fine-tuning. However, typical PETL methods inject the same structures to all Pre-trained Language Model (PLM) layers and only use the final hidden states for downstream tasks, regardless of the knowledge diversity across PLM layers. Additionally, the backpropagation path of existing PETL methods still passes through the frozen PLM during training, which is computational and memory inefficient. In this paper, we propose FLAT, a generic PETL method that explicitly and individually combines knowledge across all PLM layers based on the tokens to perform a better transferring. FLAT considers the backbone PLM as a feature extractor and combines the features in a side-network, hence the backpropagation does not involve the PLM, which results in much less memory requirement than previous methods. The results on the GLUE benchmark show that FLAT outperforms other tuning techniques in the low-resource scenarios and achieves on-par performance in the high-resource scenarios with only 0.53% trainable parameters per task and 3.2× less GPU memory usagewith BERT. Besides, further ablation study is conducted to reveal that the proposed fusion layer effectively combines knowledge from PLM and helps the classifier to exploit the PLM knowledge to downstream tasks. We will release our code for better reproducibility.

摘要

参数高效迁移学习 (PETL) 方法为微调提供了一种有效的替代方案。然而,典型的 PETL 方法将相同的结构注入到所有预训练语言模型 (PLM) 层中,并且仅使用最终隐藏状态来进行下游任务,而不考虑 PLM 层之间的知识多样性。此外,现有 PETL 方法的反向传播路径在训练过程中仍然要经过冻结的 PLM,这在计算和内存效率方面都不够高效。在本文中,我们提出了 FLAT,一种通用的 PETL 方法,它基于令牌显式且单独地组合所有 PLM 层之间的知识,以实现更好的转移。FLAT 将骨干 PLM 视为特征提取器,并在侧网络中组合特征,因此反向传播不涉及 PLM,这比以前的方法需要的内存少得多。在 GLUE 基准上的结果表明,FLAT 在资源较少的情况下优于其他调优技术,并且在资源较多的情况下,仅使用每个任务 0.53%的可训练参数和 3.2 倍的 GPU 内存即可实现与 BERT 相当的性能。此外,还进行了进一步的消融研究,以揭示所提出的融合层可以有效地结合来自 PLM 的知识,并帮助分类器利用 PLM 知识来完成下游任务。我们将发布我们的代码,以提高可重复性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验