Wang Yu, Tong Hanghang, Zhu Ziye, Hou Fengzhen, Li Yun
School of Science, China Pharmaceutical University, Nanjing, China.
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
BMC Bioinformatics. 2025 Feb 25;26(1):63. doi: 10.1186/s12859-025-06086-4.
Named entity recognition is a fundamental task in natural language processing. Recognizing entities in biomedical text, known as the BioNER, is particularly crucial for cutting-edge applications. However, BioNER poses greater challenges compared to traditional NER due to (1) nested structures and (2) category correlations inherent in biomedical entities. Recently, various BioNER models have been developed based on region classification or large language models. Despite being successful, these models still struggle to balance handling nested structures and capturing category knowledge.
We present a novel parallel BioNER model, BEAN, designed to address the unique properties of biomedical entities while achieving a reasonable balance between handling nested structures and incorporating category correlations. Extensive experiments on five public NER datasets, including four biomedical datasets, demonstrate that BEAN achieves state-of-the-art performance.
The proposed BEAN is elaborately designed to achieve two key objectives of the BioNER task: clearly detecting entity boundaries and correctly classifying entity categories. It is the first BioNER model to handle nested structures and category correlations in parallel. We exploit head, tail, and contextualized features to efficiently detect entity boundaries via a triaffine model. To the best of our knowledge, we are the first to introduce a multi-label classification model for the BioNER task to extract entity category information without boundary guidance.
命名实体识别是自然语言处理中的一项基本任务。在生物医学文本中识别实体,即生物命名实体识别(BioNER),对于前沿应用尤为关键。然而,由于(1)嵌套结构和(2)生物医学实体固有的类别相关性,BioNER相比传统命名实体识别(NER)带来了更大的挑战。最近,基于区域分类或大语言模型开发了各种BioNER模型。尽管取得了成功,但这些模型在平衡处理嵌套结构和捕捉类别知识方面仍存在困难。
我们提出了一种新颖的并行BioNER模型BEAN,旨在解决生物医学实体的独特属性,同时在处理嵌套结构和纳入类别相关性之间实现合理平衡。在五个公共NER数据集上进行的广泛实验,包括四个生物医学数据集,表明BEAN取得了领先的性能。
所提出的BEAN经过精心设计,以实现BioNER任务的两个关键目标:清晰检测实体边界并正确分类实体类别。它是第一个并行处理嵌套结构和类别相关性的BioNER模型。我们利用头部、尾部和上下文特征,通过三仿射模型有效地检测实体边界。据我们所知,我们是第一个为BioNER任务引入多标签分类模型,在无边界指导的情况下提取实体类别信息的。