Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
Bio-Synergy Research Center, Daejeon, South Korea.
BMC Bioinformatics. 2021 Oct 21;22(Suppl 11):337. doi: 10.1186/s12859-021-04248-8.
Concept recognition is a term that corresponds to the two sequential steps of named entity recognition and named entity normalization, and plays an essential role in the field of bioinformatics. However, the conventional dictionary-based methods did not sufficiently addressed the variation of the concepts in actual use in literature, resulting in the particularly degraded performances in recognition of multi-token concepts.
In this paper, we propose a concept recognition method of multi-token biological entities using neural models combined with literature contexts. The key aspect of our method is utilizing the contextual information from the biological knowledge-bases for concept normalization, which is followed by named entity recognition procedure. The model showed improved performances over conventional methods, particularly for multi-token concepts with higher variations.
We expect that our model can be utilized for effective concept recognition and variety of natural language processing tasks on bioinformatics.
概念识别是一个术语,对应于命名实体识别和命名实体规范化的两个连续步骤,在生物信息学领域中起着至关重要的作用。然而,传统的基于字典的方法并没有充分解决文献中实际使用的概念的变化,导致在识别多令牌概念时性能特别下降。
在本文中,我们提出了一种使用神经模型结合文献上下文的多令牌生物实体概念识别方法。我们方法的关键方面是利用生物知识库中的上下文信息进行概念规范化,然后是命名实体识别过程。该模型在性能上优于传统方法,特别是对于变化较大的多令牌概念。
我们期望我们的模型可以用于生物信息学上的有效概念识别和各种自然语言处理任务。