Suppr超能文献

基于机器阅读理解框架的 BERT 在生物医学命名实体识别中的应用。

Biomedical named entity recognition using BERT in the machine reading comprehension framework.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.

Abstract

Recognition of biomedical entities from literature is a challenging research focus, which is the foundation for extracting a large amount of biomedical knowledge existing in unstructured texts into structured formats. Using the sequence labeling framework to implement biomedical named entity recognition (BioNER) is currently a conventional method. This method, however, often cannot take full advantage of the semantic information in the dataset, and the performance is not always satisfactory. In this work, instead of treating the BioNER task as a sequence labeling problem, we formulate it as a machine reading comprehension (MRC) problem. This formulation can introduce more prior knowledge utilizing well-designed queries, and no longer need decoding processes such as conditional random fields (CRF). We conduct experiments on six BioNER datasets, and the experimental results demonstrate the effectiveness of our method. Our method achieves state-of-the-art (SOTA) performance on the BC4CHEMD, BC5CDR-Chem, BC5CDR-Disease, NCBI-Disease, BC2GM and JNLPBA datasets, achieving F1-scores of 92.92%, 94.19%, 87.83%, 90.04%, 85.48% and 78.93%, respectively.

摘要

从文献中识别生物医学实体是一个具有挑战性的研究重点,这是将大量存在于非结构化文本中的生物医学知识提取到结构化格式中的基础。使用序列标注框架来实现生物医学命名实体识别 (BioNER) 是目前的一种常规方法。然而,这种方法通常不能充分利用数据集中的语义信息,性能并不总是令人满意。在这项工作中,我们不是将 BioNER 任务视为序列标注问题,而是将其表述为机器阅读理解 (MRC) 问题。这种表述可以利用精心设计的查询引入更多的先验知识,而不再需要条件随机场 (CRF) 等解码过程。我们在六个 BioNER 数据集上进行了实验,实验结果证明了我们方法的有效性。我们的方法在 BC4CHEMD、BC5CDR-Chem、BC5CDR-Disease、NCBI-Disease、BC2GM 和 JNLPBA 数据集上实现了最先进的 (SOTA) 性能,分别达到了 92.92%、94.19%、87.83%、90.04%、85.48%和 78.93%的 F1 得分。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验