• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于健康与生命科学语料库中有效命名实体识别的深度掩码语言模型集成

Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora.

作者信息

Naderi Nona, Knafou Julien, Copara Jenny, Ruch Patrick, Teodoro Douglas

机构信息

Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.

Swiss Institute of Bioinformatics, Geneva, Switzerland.

出版信息

Front Res Metr Anal. 2021 Nov 19;6:689803. doi: 10.3389/frma.2021.689803. eCollection 2021.

DOI:10.3389/frma.2021.689803
PMID:34870074
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8640190/
Abstract

The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains-biology, chemistry, and medicine-available in different languages-English and French. Individual deep masked language models, pretrained on external corpora, are fined-tuned on task-specific domain and language corpora and ensembled using classical majority voting strategies. Experiments show statistically significant improvement of the ensemble models over an individual BERT-based baseline model, with an overall best performance of 77% macro F1-score. We further perform a detailed analysis of the ensemble results and show how their effectiveness changes according to entity properties, such as length, corpus frequency, and annotation consistency. The results suggest that the ensembles of deep masked language models are an effective strategy for tackling NER across corpora from the health and life science domains.

摘要

健康和生命科学领域以其在大型自由文本语料库(如科学文献和电子健康记录)中发现的大量命名实体而闻名。为了挖掘此类语料库的价值,人们提出了命名实体识别(NER)方法。受基于Transformer的预训练模型在NER方面取得成功的启发,我们评估了深度掩码语言模型的个体和集成在不同健康和生命科学领域(生物学、化学和医学)、不同语言(英语和法语)的语料库上的表现。在外部语料库上进行预训练的个体深度掩码语言模型,在特定任务的领域和语言语料库上进行微调,并使用经典的多数投票策略进行集成。实验表明,集成模型相对于基于BERT的个体基线模型有统计学上的显著改进,总体最佳性能为77%的宏F1分数。我们进一步对集成结果进行了详细分析,并展示了它们的有效性如何根据实体属性(如长度、语料库频率和注释一致性)而变化。结果表明,深度掩码语言模型的集成是解决健康和生命科学领域语料库中NER问题的有效策略。

相似文献

1
Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora.用于健康与生命科学语料库中有效命名实体识别的深度掩码语言模型集成
Front Res Metr Anal. 2021 Nov 19;6:689803. doi: 10.3389/frma.2021.689803. eCollection 2021.
2
Evaluation of clinical named entity recognition methods for Serbian electronic health records.评估塞尔维亚电子健康记录中的临床命名实体识别方法。
Int J Med Inform. 2022 Aug;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora.基于多语料库的临床试验资格标准中命名实体识别的预训练语言模型的比较研究。
BMC Med Inform Decis Mak. 2022 Sep 6;22(Suppl 3):235. doi: 10.1186/s12911-022-01967-7.
5
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn:一个基于 Transformer 的模型的医学语言理解工具包。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.
6
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
7
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
8
Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining.基于深度学习预训练的中文电子病历临床命名实体识别。
J Healthc Eng. 2020 Nov 24;2020:8829219. doi: 10.1155/2020/8829219. eCollection 2020.
9
An imConvNet-based deep learning model for Chinese medical named entity recognition.基于 imConvNet 的深度学习模型在中文医疗命名实体识别中的应用。
BMC Med Inform Decis Mak. 2022 Nov 21;22(1):303. doi: 10.1186/s12911-022-02049-4.
10
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.

引用本文的文献

1
Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.对语言模型进行微调,以实现医学检查报告的自动结构化,从而改善患者筛查与分析。
Sci Rep. 2025 Jul 4;15(1):23949. doi: 10.1038/s41598-025-05695-6.
2
A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models.用于评估语言模型中生物医学概念语境化表示的数据集。
Sci Data. 2024 May 4;11(1):455. doi: 10.1038/s41597-024-03317-w.
3
Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey.

本文引用的文献

1
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents.ChEMU 2020:自然语言处理方法对从化学专利中提取信息有效。
Front Res Metr Anal. 2021 Mar 25;6:654438. doi: 10.3389/frma.2021.654438. eCollection 2021.
2
Medical Information Extraction in the Age of Deep Learning.深度学习时代的医学信息抽取。
Yearb Med Inform. 2020 Aug;29(1):208-220. doi: 10.1055/s-0040-1702001. Epub 2020 Aug 21.
3
Clinical concept extraction: A methodology review.临床概念提取:方法学综述。
探索多语言医学自然语言处理的最新亮点:综述。
Yearb Med Inform. 2023 Aug;32(1):230-243. doi: 10.1055/s-0043-1768726. Epub 2023 Dec 26.
4
Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature.深度学习语言模型集合,用于支持针对 COVID-19 文献创建实时系统综述。
Syst Rev. 2023 Jun 5;12(1):94. doi: 10.1186/s13643-023-02247-9.
J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.
4
2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records.2018n2c2 电子健康记录中药物不良反应和药物提取共享任务。
J Am Med Inform Assoc. 2020 Jan 1;27(1):3-12. doi: 10.1093/jamia/ocz166.
5
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
6
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
7
A study of deep learning approaches for medication and adverse drug event extraction from clinical text.深度学习方法在从临床文本中提取药物和药物不良事件的研究。
J Am Med Inform Assoc. 2020 Jan 1;27(1):13-21. doi: 10.1093/jamia/ocz063.
8
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
9
Chemlistem: chemical named entity recognition using recurrent neural networks.Chemlistem:使用循环神经网络的化学命名实体识别
J Cheminform. 2018 Dec 6;10(1):59. doi: 10.1186/s13321-018-0313-8.
10
Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。
Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.