• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AUBER:自动 BERT 正则化。

AUBER: Automated BERT regularization.

机构信息

Columbia University, New York, NY, United States of America.

Seoul National University, Seoul, Republic of Korea.

出版信息

PLoS One. 2021 Jun 28;16(6):e0253241. doi: 10.1371/journal.pone.0253241. eCollection 2021.

DOI:10.1371/journal.pone.0253241
PMID:34181664
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8238198/
Abstract

How can we effectively regularize BERT? Although BERT proves its effectiveness in various NLP tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance. However, these methods are usually suboptimal since they resort to arbitrarily determined numbers of attention heads to be pruned and do not directly aim for the performance enhancement. In order to overcome such a limitation, we propose AUBER, an automated BERT regularization method, that leverages reinforcement learning to automatically prune the proper attention heads from BERT. We also minimize the model complexity and the action search space by proposing a low-dimensional state representation and dually-greedy approach for training. Experimental results show that AUBER outperforms existing pruning methods by achieving up to 9.58% better performance. In addition, the ablation study demonstrates the effectiveness of design choices for AUBER.

摘要

如何有效地正则化 BERT?虽然 BERT 在各种自然语言处理任务中证明了其有效性,但当训练实例数量很少时,它经常会过拟合。正则化 BERT 的一个有前途的方向是基于用头重要性的代理分数来修剪其注意力头。然而,这些方法通常不是最优的,因为它们依赖于任意确定的要修剪的注意力头的数量,并且不直接针对性能增强。为了克服这种局限性,我们提出了 AUBER,一种自动化的 BERT 正则化方法,它利用强化学习从 BERT 中自动修剪适当的注意力头。我们还通过提出低维状态表示和双重贪婪方法来最小化模型复杂度和动作搜索空间。实验结果表明,AUBER 通过实现高达 9.58%的更好性能,优于现有的修剪方法。此外,消融研究证明了 AUBER 的设计选择的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/a613aed652ca/pone.0253241.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/e30e450ad27e/pone.0253241.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/4818944d0315/pone.0253241.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/4663cba2113e/pone.0253241.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/a613aed652ca/pone.0253241.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/e30e450ad27e/pone.0253241.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/4818944d0315/pone.0253241.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/4663cba2113e/pone.0253241.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/620c/8238198/a613aed652ca/pone.0253241.g004.jpg

相似文献

1
AUBER: Automated BERT regularization.AUBER:自动 BERT 正则化。
PLoS One. 2021 Jun 28;16(6):e0253241. doi: 10.1371/journal.pone.0253241. eCollection 2021.
2
DDK: Dynamic structure pruning based on differentiable search and recursive knowledge distillation for BERT.基于可微分搜索和递归知识蒸馏的 BERT 动态结构剪枝。
Neural Netw. 2024 May;173:106164. doi: 10.1016/j.neunet.2024.106164. Epub 2024 Feb 9.
3
SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression.SensiMix:用于 BERT 压缩的敏感度感知 8 位索引和 1 位值混合精度量化。
PLoS One. 2022 Apr 18;17(4):e0265621. doi: 10.1371/journal.pone.0265621. eCollection 2022.
4
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.
5
GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models.GT-Finder:使用预训练的 BERT 语言模型对葡萄糖转运蛋白家族进行分类。
Comput Biol Med. 2021 Apr;131:104259. doi: 10.1016/j.compbiomed.2021.104259. Epub 2021 Feb 7.
6
What does Chinese BERT learn about syntactic knowledge?中文BERT学到了哪些句法知识?
PeerJ Comput Sci. 2023 Jul 26;9:e1478. doi: 10.7717/peerj-cs.1478. eCollection 2023.
7
BertMCN: Mapping colloquial phrases to standard medical concepts using BERT and highway network.BertMCN:使用 BERT 和高速公路网络将俗语映射到标准医学概念。
Artif Intell Med. 2021 Feb;112:102008. doi: 10.1016/j.artmed.2021.102008. Epub 2021 Jan 7.
8
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
9
Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征(BERT)以评估临床语义文本相似性:算法开发与验证研究。
JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.
10
iAMP-Attenpred: a novel antimicrobial peptide predictor based on BERT feature extraction method and CNN-BiLSTM-Attention combination model.iAMP-Attenpred:一种基于 BERT 特征提取方法和 CNN-BiLSTM-Attention 组合模型的新型抗菌肽预测器。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad443.

引用本文的文献

1
SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression.SensiMix:用于 BERT 压缩的敏感度感知 8 位索引和 1 位值混合精度量化。
PLoS One. 2022 Apr 18;17(4):e0265621. doi: 10.1371/journal.pone.0265621. eCollection 2022.