Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa110.
Biomedical information extraction (BioIE) is an important task. The aim is to analyze biomedical texts and extract structured information such as named entities and semantic relations between them. In recent years, pre-trained language models have largely improved the performance of BioIE. However, they neglect to incorporate external structural knowledge, which can provide rich factual information to support the underlying understanding and reasoning for biomedical information extraction. In this paper, we first evaluate current extraction methods, including vanilla neural networks, general language models and pre-trained contextualized language models on biomedical information extraction tasks, including named entity recognition, relation extraction and event extraction. We then propose to enrich a contextualized language model by integrating a large scale of biomedical knowledge graphs (namely, BioKGLM). In order to effectively encode knowledge, we explore a three-stage training procedure and introduce different fusion strategies to facilitate knowledge injection. Experimental results on multiple tasks show that BioKGLM consistently outperforms state-of-the-art extraction models. A further analysis proves that BioKGLM can capture the underlying relations between biomedical knowledge concepts, which are crucial for BioIE.
生物医学信息抽取(BioIE)是一项重要任务。其目的是分析生物医学文本并从中提取命名实体和它们之间的语义关系等结构化信息。近年来,预训练语言模型极大地提高了 BioIE 的性能。然而,它们忽略了结合外部结构知识,而这些知识可以提供丰富的事实信息,以支持生物医学信息抽取的基础理解和推理。在本文中,我们首先评估了当前的抽取方法,包括命名实体识别、关系抽取和事件抽取等生物医学信息抽取任务中的香草神经网络、通用语言模型和预训练上下文语言模型。然后,我们提出通过整合大规模的生物医学知识图谱(即 BioKGLM)来丰富上下文语言模型。为了有效地编码知识,我们探索了一个三阶段的训练过程,并引入了不同的融合策略来促进知识注入。在多个任务上的实验结果表明,BioKGLM 始终优于最先进的抽取模型。进一步的分析证明了 BioKGLM 可以捕捉生物医学知识概念之间的潜在关系,这对 BioIE 至关重要。