Suppr超能文献

临床文本中的不连续命名实体:系统文献综述

Discontinuous named entities in clinical text: A systematic literature review.

作者信息

Alhassan Areej, Schlegel Viktor, Aloud Monira, Batista-Navarro Riza, Nenadic Goran

机构信息

University of Manchester, United Kingdom; King Saud University, Saudi Arabia.

University of Manchester, United Kingdom; Imperial Global, Singapore.

出版信息

J Biomed Inform. 2025 Feb;162:104783. doi: 10.1016/j.jbi.2025.104783. Epub 2025 Jan 23.

Abstract

OBJECTIVE

Extracting named entities from clinical free-text presents unique challenges, particularly when dealing with discontinuous entities-mentions that are separated by unrelated words. Traditional NER methods often struggle to accurately identify these entities, prompting the development of specialised computational solutions. This paper systematically reviews and presents the methodologies developed for Discontinuous Named Entity Recognition in clinical texts, highlighting their effectiveness and the challenges they face.

METHOD

We conducted a systematic literature review focused on discontinuous named entities, using structured searches across four Computer Science-related and one medical-related electronic database. A combination of search terms, grouped into three synonym categories-problem, entity/approach, and task-yielded 2,442 articles. Guided by our research objectives, we identified five key dimensions to systematically annotate and normalise the data for comprehensive analysis.

RESULT

The review included 44 studies which were coded across several key dimensions: the chronological development of approaches, the corpora used, the downstream tasks affected by discontinuous named entities, the methodological approaches proposed to address the issue, and the reported performance outcomes. The discussion section examines the challenges encountered in this area and suggests potential directions for future research.

CONCLUSION

Significant progress has been made in discontinuous named entity recognition; however, there remains a need for more adaptable, generalisable solutions that are independent of custom annotation schemes. Exploring various configurations of generative language models presents a promising avenue for advancing this area. Additionally, future research should investigate the impact of precise versus imprecise recognition of discontinuous entities on clinical downstream tasks to better understand its practical implications in healthcare applications.

摘要

目的

从临床自由文本中提取命名实体存在独特的挑战,尤其是在处理不连续实体时,即被无关词汇隔开的提及内容。传统的命名实体识别(NER)方法往往难以准确识别这些实体,这促使了专门的计算解决方案的发展。本文系统地回顾并介绍了为临床文本中不连续命名实体识别所开发的方法,强调了它们的有效性以及所面临的挑战。

方法

我们针对不连续命名实体进行了系统的文献综述,在四个计算机科学相关和一个医学相关的电子数据库中进行结构化搜索。将搜索词组合成三个同义词类别——问题、实体/方法和任务,共得到2442篇文章。在我们的研究目标指导下,我们确定了五个关键维度,以便对数据进行系统标注和规范化,从而进行全面分析。

结果

该综述纳入了44项研究,这些研究在几个关键维度上进行了编码:方法的时间发展、所使用的语料库、受不连续命名实体影响的下游任务、为解决该问题提出的方法以及报告的性能结果。讨论部分审视了该领域遇到的挑战,并提出了未来研究的潜在方向。

结论

在不连续命名实体识别方面已经取得了重大进展;然而,仍然需要更具适应性、可推广的解决方案,这些方案独立于自定义标注方案。探索生成式语言模型的各种配置为推进该领域提供了一条有前景的途径。此外,未来的研究应该调查不连续实体的精确识别与不精确识别对临床下游任务的影响,以便更好地理解其在医疗保健应用中的实际意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验