医疗文本处理中的人工智能:应用于命名实体识别的综述

Artificial intelligence in healthcare text processing: a review applied to named entity recognition.

作者信息

de Almeida Samuel Santana, Silva Fontes Raphael, Pareja Credidio Freire Alves Luca, Júnior Methanias Colaço, José Pinheiro Caldeira Silva Gleyson, Ramalho Cortez Lyane, de Morais Antonio Higor Freire, Medeiros Machado Guilherme, Gonçalo Oliveira Hugo, Cunha-Oliveira Aliete, Dos Santos João Paulo Queiroz, de Medeiros Valentim Ricardo Alexsandro

机构信息

Postgraduate Program in Computer Science (PROCC), Federal University of Sergipe, São Cristóvão, Brazil.

Center for Innovation and Advanced Technology (NAVI), Federal Institute of Rio Grande do Norte, Natal, Brazil.

出版信息

Front Artif Intell. 2025 Jul 7;8:1584203. doi: 10.3389/frai.2025.1584203. eCollection 2025.

Abstract

CONTEXT

Traditional methods such as rule-based systems, word embeddings (e.g. Word2Vec, GloVe) and sequence tagging models such as CRFs and HMMs have difficulty capturing the complex and nuanced context of medical texts, leading to low precision and inflexibility. These methods also struggle with the inherent variability of medical language and often require large and difficult-to-obtain labeled datasets.

OBJECTIVE

We examine the growing importance of Named Entity Recognition (NER) in the analysis of healthcare texts. NER, a fundamental technique in Natural Language Processing (NLP), automatically identifies and categorizes named entities in the text, such as names of people and organizations, in medical texts, medical conditions and drug names. This facilitates better information retrieval, personalized medicine approaches and clinical decision support systems.

METHODS

A systematic mapping was carried out that focused on advanced language models, specifically transformation-based models such as BERT. These models are known for capturing complex semantic dependencies and linguistic nuances, which are crucial for accurate processing of medical texts. Transformation architectures, unlike traditional techniques such as CNNs and RNNs, are better suited to dealing with the contextual and semantic nature of medical texts due to their ability to manage long sequences and the need for high precision.

RESULTS

The results indicate that transformation-based models, in particular BERT and its specialized variants (e.g. ClinicalBERT), consistently demonstrate high performance on NER tasks, with F1 scores often exceeding 97%, outperforming traditional and hybrid methods. When examining the geographical distribution of contributions, the research identifies a significant contribution from China, followed by the United States. These findings have crucial implications for the integration of NER technologies into the Brazilian National Health System (SUS).

CONCLUSION

This systematic review contributes to the advancement of NER in health texts by evaluating methods, showing results and highlighting the wider implications for the field. The article is systematically structured into the following sections: Methodology, Bibliometric analysis, Results and discussion, Threats to validity, Future work and Conclusion. This systematic organization provides a comprehensive review of the research, its impact and future directions, highlighting the importance of keeping up to date with advances in the field to increase the relevance of NER applications in healthcare.

摘要

背景

基于规则的系统、词嵌入(如Word2Vec、GloVe)等传统方法以及诸如条件随机场(CRFs)和隐马尔可夫模型(HMMs)等序列标注模型,在捕捉医学文本复杂而细微的上下文方面存在困难,导致精度较低且缺乏灵活性。这些方法还难以应对医学语言固有的变异性,并且通常需要大量难以获取的标注数据集。

目的

我们研究命名实体识别(NER)在医疗文本分析中日益增长的重要性。NER是自然语言处理(NLP)中的一项基础技术,可自动识别和分类文本中的命名实体,如医学文本中的人名、组织名、病症名和药物名。这有助于更好地进行信息检索、个性化医疗方法以及临床决策支持系统。

方法

进行了一项系统映射研究,重点关注先进的语言模型,特别是基于变换的模型,如BERT。这些模型以捕捉复杂的语义依赖关系和语言细微差别而闻名,这对于准确处理医学文本至关重要。与卷积神经网络(CNNs)和循环神经网络(RNNs)等传统技术不同,变换架构由于其处理长序列的能力和高精度需求而更适合处理医学文本的上下文和语义性质。

结果

结果表明,基于变换的模型,特别是BERT及其专门变体(如ClinicalBERT),在NER任务上始终表现出高性能,F1分数通常超过97%,优于传统方法和混合方法。在考察贡献的地理分布时,研究发现中国贡献显著,其次是美国。这些发现对将NER技术整合到巴西国家卫生系统(SUS)具有至关重要的意义。

结论

本系统综述通过评估方法、展示结果并强调该领域的更广泛影响,为健康文本中NER的发展做出了贡献。文章系统地分为以下几个部分:方法、文献计量分析、结果与讨论、效度威胁、未来工作和结论。这种系统的组织方式对该研究、其影响和未来方向进行了全面综述,突出了跟上该领域进展以提高NER在医疗保健应用中的相关性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d047/12277242/6b694008ee0b/frai-08-1584203-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索