• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于生物医学命名实体识别的神经网络多任务学习方法。

A neural network multi-task learning approach to biomedical named entity recognition.

作者信息

Crichton Gamal, Pyysalo Sampo, Chiu Billy, Korhonen Anna

机构信息

Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.

出版信息

BMC Bioinformatics. 2017 Aug 15;18(1):368. doi: 10.1186/s12859-017-1776-8.

DOI:10.1186/s12859-017-1776-8
PMID:28810903
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5558737/
Abstract

BACKGROUND

Named Entity Recognition (NER) is a key task in biomedical text mining. Accurate NER systems require task-specific, manually-annotated datasets, which are expensive to develop and thus limited in size. Since such datasets contain related but different information, an interesting question is whether it might be possible to use them together to improve NER performance. To investigate this, we develop supervised, multi-task, convolutional neural network models and apply them to a large number of varied existing biomedical named entity datasets. Additionally, we investigated the effect of dataset size on performance in both single- and multi-task settings.

RESULTS

We present a single-task model for NER, a Multi-output multi-task model and a Dependent multi-task model. We apply the three models to 15 biomedical datasets containing multiple named entities including Anatomy, Chemical, Disease, Gene/Protein and Species. Each dataset represent a task. The results from the single-task model and the multi-task models are then compared for evidence of benefits from Multi-task Learning. With the Multi-output multi-task model we observed an average F-score improvement of 0.8% when compared to the single-task model from an average baseline of 78.4%. Although there was a significant drop in performance on one dataset, performance improves significantly for five datasets by up to 6.3%. For the Dependent multi-task model we observed an average improvement of 0.4% when compared to the single-task model. There were no significant drops in performance on any dataset, and performance improves significantly for six datasets by up to 1.1%. The dataset size experiments found that as dataset size decreased, the multi-output model's performance increased compared to the single-task model's. Using 50, 25 and 10% of the training data resulted in an average drop of approximately 3.4, 8 and 16.7% respectively for the single-task model but approximately 0.2, 3.0 and 9.8% for the multi-task model.

CONCLUSIONS

Our results show that, on average, the multi-task models produced better NER results than the single-task models trained on a single NER dataset. We also found that Multi-task Learning is beneficial for small datasets. Across the various settings the improvements are significant, demonstrating the benefit of Multi-task Learning for this task.

摘要

背景

命名实体识别(NER)是生物医学文本挖掘中的一项关键任务。准确的NER系统需要特定任务的、人工标注的数据集,而开发这些数据集成本高昂,因此规模有限。由于此类数据集包含相关但不同的信息,一个有趣的问题是,是否有可能将它们一起使用以提高NER性能。为了研究这一点,我们开发了有监督的多任务卷积神经网络模型,并将其应用于大量多样的现有生物医学命名实体数据集。此外,我们还研究了数据集大小在单任务和多任务设置中对性能的影响。

结果

我们提出了一个用于NER的单任务模型、一个多输出多任务模型和一个依赖多任务模型。我们将这三个模型应用于15个生物医学数据集,这些数据集包含多个命名实体,包括解剖学、化学物质、疾病、基因/蛋白质和物种。每个数据集代表一项任务。然后比较单任务模型和多任务模型的结果,以证明多任务学习的益处。使用多输出多任务模型时,与单任务模型相比,我们观察到平均F值从78.4%的平均基线提高了0.8%。虽然在一个数据集上性能有显著下降,但在五个数据集上性能显著提高,最高可达6.3%。对于依赖多任务模型,与单任务模型相比,我们观察到平均提高了0.4%。在任何数据集上性能都没有显著下降,并且在六个数据集上性能显著提高,最高可达1.1%。数据集大小实验发现,随着数据集大小的减小,与单任务模型相比,多输出模型的性能有所提高。使用50%、25%和10%的训练数据时,单任务模型的平均下降分别约为3.4%、8%和16.7%,而多任务模型的平均下降分别约为0.2%、3.0%和9.8%。

结论

我们的结果表明,平均而言,多任务模型比在单个NER数据集上训练的单任务模型产生了更好的NER结果。我们还发现多任务学习对小数据集有益。在各种设置下,改进都很显著,证明了多任务学习对这项任务的益处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/460681758cca/12859_2017_1776_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/ab1d56247c56/12859_2017_1776_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/bb9f8eb35a5a/12859_2017_1776_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/460681758cca/12859_2017_1776_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/ab1d56247c56/12859_2017_1776_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/bb9f8eb35a5a/12859_2017_1776_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7166/5558737/460681758cca/12859_2017_1776_Fig3_HTML.jpg

相似文献

1
A neural network multi-task learning approach to biomedical named entity recognition.一种用于生物医学命名实体识别的神经网络多任务学习方法。
BMC Bioinformatics. 2017 Aug 15;18(1):368. doi: 10.1186/s12859-017-1776-8.
2
Dataset-aware multi-task learning approaches for biomedical named entity recognition.基于数据集的多任务学习方法在生物医学命名实体识别中的应用。
Bioinformatics. 2020 Aug 1;36(15):4331-4338. doi: 10.1093/bioinformatics/btaa515.
3
Augmenting biomedical named entity recognition with general-domain resources.利用通用领域资源增强生物医学命名实体识别。
J Biomed Inform. 2024 Nov;159:104731. doi: 10.1016/j.jbi.2024.104731. Epub 2024 Oct 4.
4
Multitask learning for biomedical named entity recognition with cross-sharing structure.基于交叉共享结构的生物医学命名实体识别的多任务学习。
BMC Bioinformatics. 2019 Aug 16;20(1):427. doi: 10.1186/s12859-019-3000-5.
5
MMBERT: a unified framework for biomedical named entity recognition.MMBERT:一个用于生物医学命名实体识别的统一框架。
Med Biol Eng Comput. 2024 Jan;62(1):327-341. doi: 10.1007/s11517-023-02934-8. Epub 2023 Oct 14.
6
CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.CollaboNet:用于生物医学命名实体识别的深度神经网络协作。
BMC Bioinformatics. 2019 May 29;20(Suppl 10):249. doi: 10.1186/s12859-019-2813-6.
7
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
8
Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.
9
Augmenting Biomedical Named Entity Recognition with General-domain Resources.利用通用领域资源增强生物医学命名实体识别
ArXiv. 2024 Dec 30:arXiv:2406.10671v4.
10
Language model based on deep learning network for biomedical named entity recognition.基于深度学习网络的生物医学命名实体识别语言模型。
Methods. 2024 Jun;226:71-77. doi: 10.1016/j.ymeth.2024.04.013. Epub 2024 Apr 17.

引用本文的文献

1
PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science.PubMed知识图谱2.0:连接生物医学领域的论文、专利和临床试验
Sci Data. 2025 Jun 17;12(1):1018. doi: 10.1038/s41597-025-05343-8.
2
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.
3
Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks.

本文引用的文献

1
Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis.基于深度模型的生物图像分析迁移与多任务学习
IEEE Trans Big Data. 2020 Jun;6(2):322-333. doi: 10.1109/tbdata.2016.2573280. Epub 2016 May 30.
2
TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne:使用半马尔可夫模型进行联合命名实体识别与归一化
Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.
3
Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.
基于转换器的神经网络从文本数据中提取的 miRNA-疾病关系数据集。
Database (Oxford). 2024 Aug 5;2024. doi: 10.1093/database/baae066.
4
From Organelle Morphology to Whole-Plant Phenotyping: A Phenotypic Detection Method Based on Deep Learning.从细胞器形态到全株表型分析:一种基于深度学习的表型检测方法
Plants (Basel). 2024 Apr 23;13(9):1177. doi: 10.3390/plants13091177.
5
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
6
BioBBC: a multi-feature model that enhances the detection of biomedical entities.生物 BBC:一种增强生物医学实体检测的多特征模型。
Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.
7
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
8
A comprehensive large scale biomedical knowledge graph for AI powered data driven biomedical research.一个用于人工智能驱动的数据驱动型生物医学研究的综合性大规模生物医学知识图谱。
bioRxiv. 2025 Mar 4:2023.10.13.562216. doi: 10.1101/2023.10.13.562216.
9
Dictionary-based matching graph network for biomedical named entity recognition.基于词典匹配图网络的生物医学命名实体识别。
Sci Rep. 2023 Dec 8;13(1):21667. doi: 10.1038/s41598-023-48564-w.
10
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
2013年生物自然语言处理共享任务的癌症遗传学与通路注释任务概述。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2105-16-S10-S2. Epub 2015 Jul 13.
4
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
5
Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics.通过预处理分析、知识丰富的特征和启发式方法优化化学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. doi: 10.1186/1758-2946-7-S1-S6. eCollection 2015.
6
tmChem: a high performance approach for chemical named entity recognition and normalization.tmChem:一种用于化学命名实体识别和标准化的高性能方法。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.
7
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
8
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
9
Anatomical entity mention recognition at literature scale.文献级别的解剖实体提及识别。
Bioinformatics. 2014 Mar 15;30(6):868-75. doi: 10.1093/bioinformatics/btt580. Epub 2013 Oct 25.
10
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.用于快速准确识别文本中分类名称的物种和生物体资源。
PLoS One. 2013 Jun 18;8(6):e65390. doi: 10.1371/journal.pone.0065390. Print 2013.