Suppr超能文献

基于知识增强的生物医学命名实体识别与规范:在蛋白质和基因上的应用。

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Chuangxinyuan Building, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, Liaoning, China.

出版信息

BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.

Abstract

BACKGROUND

Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers.

RESULTS

To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance.

CONCLUSIONS

We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement.

摘要

背景

自动化生物医学命名实体识别和标准化是信息管理中许多下游应用的基础。然而,由于名称变化和实体歧义,这项任务具有挑战性。一个生物医学实体可能有多个变体,一个变体可以表示几个不同的实体标识符。

结果

为了解决上述问题,我们提出了一种新颖的基于知识的蛋白质/基因命名实体识别(PNER)和标准化(PNEN)系统。一方面,从生物医学知识库中提取大量实体名称知识,以识别更多的实体变体。另一方面,提取实体的结构知识并编码为标识符(ID)嵌入,然后用于更好地进行实体标准化。此外,还将预训练语言模型生成的深语境化单词表示纳入我们的知识增强系统中,以对实体的多义词信息进行建模。在 BioCreative VI Bio-ID 语料库上的实验结果表明,我们提出的知识增强系统在 PNER 方面的 F1 得分为 0.871,在 PNEN 方面的 F1 得分为 0.445,分别达到了新的最先进水平。

结论

我们提出了一种结合实体知识和深语境化单词表示的知识增强系统。对比结果表明,实体知识有利于 PNER 和 PNEN 任务,可以与我们系统中的语境化信息很好地结合,以进一步提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfed/6990512/e42e3a52cb22/12859_2020_3375_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验