Suppr超能文献

基于层次共享迁移学习的生物医学命名实体识别。

Hierarchical shared transfer learning for biomedical named entity recognition.

机构信息

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.

School of Public Health, Peking University, Beijing, China.

出版信息

BMC Bioinformatics. 2022 Jan 4;23(1):8. doi: 10.1186/s12859-021-04551-4.

Abstract

BACKGROUND

Biomedical named entity recognition (BioNER) is a basic and important medical information extraction task to extract medical entities with special meaning from medical texts. In recent years, deep learning has become the main research direction of BioNER due to its excellent data-driven context coding ability. However, in BioNER task, deep learning has the problem of poor generalization and instability.

RESULTS

we propose the hierarchical shared transfer learning, which combines multi-task learning and fine-tuning, and realizes the multi-level information fusion between the underlying entity features and the upper data features. We select 14 datasets containing 4 types of entities for training and evaluate the model. The experimental results showed that the F1-scores of the five gold standard datasets BC5CDR-chemical, BC5CDR-disease, BC2GM, BC4CHEMD, NCBI-disease and LINNAEUS were increased by 0.57, 0.90, 0.42, 0.77, 0.98 and - 2.16 compared to the single-task XLNet-CRF model. BC5CDR-chemical, BC5CDR-disease and BC4CHEMD achieved state-of-the-art results.The reasons why LINNAEUS's multi-task results are lower than single-task results are discussed at the dataset level.

CONCLUSION

Compared with using multi-task learning and fine-tuning alone, the model has more accurate recognition ability of medical entities, and has higher generalization and stability.

摘要

背景

生物医学命名实体识别(BioNER)是从医学文本中提取具有特殊意义的医学实体的基本且重要的医学信息提取任务。近年来,由于其出色的数据驱动上下文编码能力,深度学习已成为 BioNER 的主要研究方向。然而,在 BioNER 任务中,深度学习存在泛化能力差和不稳定性的问题。

结果

我们提出了层次共享转移学习,它结合了多任务学习和微调,实现了底层实体特征和上层数据特征之间的多层次信息融合。我们选择了包含 4 种实体的 14 个数据集进行训练和评估模型。实验结果表明,在五个黄金标准数据集 BC5CDR-chemical、BC5CDR-disease、BC2GM、BC4CHEMD、NCBI-disease 和 LINNAEUS 上,F1 分数分别比单任务 XLNet-CRF 模型提高了 0.57、0.90、0.42、0.77、0.98 和 -2.16。BC5CDR-chemical、BC5CDR-disease 和 BC4CHEMD 达到了最先进的水平。在数据集层面上讨论了 LINNAEUS 的多任务结果低于单任务结果的原因。

结论

与单独使用多任务学习和微调相比,该模型对医学实体的识别能力更准确,具有更高的泛化能力和稳定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afa6/8729142/437bd8824256/12859_2021_4551_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验