• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用生成对抗网络(GAN)和/或预训练语言模型改进植物健康领域的文本挖掘。

Improving text mining in plant health domain with GAN and/or pre-trained language model.

作者信息

Jiang Shufan, Cormier Stéphane, Angarita Rafael, Rousseaux Francis

机构信息

CReSTIC, Université de Reims Champagne Ardenne, Reims, France.

LISITE, Institut Supérieur d'Electronique de Paris, Paris, France.

出版信息

Front Artif Intell. 2023 Feb 21;6:1072329. doi: 10.3389/frai.2023.1072329. eCollection 2023.

DOI:10.3389/frai.2023.1072329
PMID:36895200
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9989305/
Abstract

The Bidirectional Encoder Representations from Transformers (BERT) architecture offers a cutting-edge approach to Natural Language Processing. It involves two steps: 1) pre-training a language model to extract contextualized features and 2) fine-tuning for specific downstream tasks. Although pre-trained language models (PLMs) have been successful in various text-mining applications, challenges remain, particularly in areas with limited labeled data such as plant health hazard detection from individuals' observations. To address this challenge, we propose to combine GAN-BERT, a model that extends the fine-tuning process with unlabeled data through a Generative Adversarial Network (GAN), with ChouBERT, a domain-specific PLM. Our results show that GAN-BERT outperforms traditional fine-tuning in multiple text classification tasks. In this paper, we examine the impact of further pre-training on the GAN-BERT model. We experiment with different hyper parameters to determine the best combination of models and fine-tuning parameters. Our findings suggest that the combination of GAN and ChouBERT can enhance the generalizability of the text classifier but may also lead to increased instability during training. Finally, we provide recommendations to mitigate these instabilities.

摘要

来自变换器的双向编码器表征(BERT)架构为自然语言处理提供了一种前沿方法。它包括两个步骤:1)预训练一个语言模型以提取上下文特征,以及2)针对特定的下游任务进行微调。尽管预训练语言模型(PLM)在各种文本挖掘应用中取得了成功,但挑战依然存在,尤其是在标记数据有限的领域,例如从个人观察中检测植物健康危害。为应对这一挑战,我们建议将GAN-BERT(一种通过生成对抗网络(GAN)用未标记数据扩展微调过程的模型)与特定领域的PLM——ChouBERT相结合。我们的结果表明,GAN-BERT在多个文本分类任务中优于传统微调。在本文中,我们研究了进一步预训练对GAN-BERT模型的影响。我们对不同的超参数进行实验,以确定模型和微调参数的最佳组合。我们的研究结果表明,GAN和ChouBERT的结合可以提高文本分类器的泛化能力,但也可能导致训练过程中不稳定性增加。最后,我们提供了减轻这些不稳定性的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/672601719f33/frai-06-1072329-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/3c64d837e420/frai-06-1072329-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/00c42a849f83/frai-06-1072329-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/fa1ab56fb595/frai-06-1072329-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/6f39c67d99bb/frai-06-1072329-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/642ce2fca632/frai-06-1072329-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/834502372187/frai-06-1072329-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/672601719f33/frai-06-1072329-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/3c64d837e420/frai-06-1072329-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/00c42a849f83/frai-06-1072329-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/fa1ab56fb595/frai-06-1072329-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/6f39c67d99bb/frai-06-1072329-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/642ce2fca632/frai-06-1072329-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/834502372187/frai-06-1072329-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08aa/9989305/672601719f33/frai-06-1072329-g0007.jpg

相似文献

1
Improving text mining in plant health domain with GAN and/or pre-trained language model.利用生成对抗网络(GAN)和/或预训练语言模型改进植物健康领域的文本挖掘。
Front Artif Intell. 2023 Feb 21;6:1072329. doi: 10.3389/frai.2023.1072329. eCollection 2023.
2
When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博:预训练语言模型在疾病分类上的学习曲线分析。
BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators.LMGAN:具有多个生成器的语言信息半监督生成对抗网络
Sensors (Basel). 2022 Nov 13;22(22):8761. doi: 10.3390/s22228761.
5
BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
6
Bidirectional Encoder Representations from Transformers-like large language models in patient safety and pharmacovigilance: A comprehensive assessment of causal inference implications.基于像 Transformer 这样的大型语言模型的双向编码器表示在患者安全和药物警戒中的应用:因果推断影响的综合评估。
Exp Biol Med (Maywood). 2023 Nov;248(21):1908-1917. doi: 10.1177/15353702231215895. Epub 2023 Dec 12.
7
Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。
J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.
8
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
9
Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。
BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.
10
Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。
BMC Bioinformatics. 2022 Apr 4;23(1):120. doi: 10.1186/s12859-022-04642-w.

引用本文的文献

1
Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach.多语言仇恨言论检测:一种半监督生成对抗方法。
Entropy (Basel). 2024 Apr 18;26(4):344. doi: 10.3390/e26040344.

本文引用的文献

1
Validating GAN-BioBERT: A Methodology for Assessing Reporting Trends in Clinical Trials.验证GAN-BioBERT:一种评估临床试验报告趋势的方法。
Front Digit Health. 2022 May 24;4:878369. doi: 10.3389/fdgth.2022.878369. eCollection 2022.
2
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
3
From Observation to Information: Data-Driven Understanding of on Farm Yield Variation.
从观察到信息:基于数据驱动对农场产量变异的理解
PLoS One. 2016 Mar 1;11(3):e0150015. doi: 10.1371/journal.pone.0150015. eCollection 2016.