医疗文本处理中的人工智能：应用于命名实体识别的综述

Artificial intelligence in healthcare text processing: a review applied to named entity recognition.

作者信息

de Almeida Samuel Santana, Silva Fontes Raphael, Pareja Credidio Freire Alves Luca, Júnior Methanias Colaço, José Pinheiro Caldeira Silva Gleyson, Ramalho Cortez Lyane, de Morais Antonio Higor Freire, Medeiros Machado Guilherme, Gonçalo Oliveira Hugo, Cunha-Oliveira Aliete, Dos Santos João Paulo Queiroz, de Medeiros Valentim Ricardo Alexsandro

机构信息

Postgraduate Program in Computer Science (PROCC), Federal University of Sergipe, São Cristóvão, Brazil.

Center for Innovation and Advanced Technology (NAVI), Federal Institute of Rio Grande do Norte, Natal, Brazil.

出版信息

Front Artif Intell. 2025 Jul 7;8:1584203. doi: 10.3389/frai.2025.1584203. eCollection 2025.

DOI:10.3389/frai.2025.1584203

PMID:40693280

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12277242/

Abstract

CONTEXT

Traditional methods such as rule-based systems, word embeddings (e.g. Word2Vec, GloVe) and sequence tagging models such as CRFs and HMMs have difficulty capturing the complex and nuanced context of medical texts, leading to low precision and inflexibility. These methods also struggle with the inherent variability of medical language and often require large and difficult-to-obtain labeled datasets.

OBJECTIVE

We examine the growing importance of Named Entity Recognition (NER) in the analysis of healthcare texts. NER, a fundamental technique in Natural Language Processing (NLP), automatically identifies and categorizes named entities in the text, such as names of people and organizations, in medical texts, medical conditions and drug names. This facilitates better information retrieval, personalized medicine approaches and clinical decision support systems.

METHODS

A systematic mapping was carried out that focused on advanced language models, specifically transformation-based models such as BERT. These models are known for capturing complex semantic dependencies and linguistic nuances, which are crucial for accurate processing of medical texts. Transformation architectures, unlike traditional techniques such as CNNs and RNNs, are better suited to dealing with the contextual and semantic nature of medical texts due to their ability to manage long sequences and the need for high precision.

RESULTS

The results indicate that transformation-based models, in particular BERT and its specialized variants (e.g. ClinicalBERT), consistently demonstrate high performance on NER tasks, with F1 scores often exceeding 97%, outperforming traditional and hybrid methods. When examining the geographical distribution of contributions, the research identifies a significant contribution from China, followed by the United States. These findings have crucial implications for the integration of NER technologies into the Brazilian National Health System (SUS).

CONCLUSION

This systematic review contributes to the advancement of NER in health texts by evaluating methods, showing results and highlighting the wider implications for the field. The article is systematically structured into the following sections: Methodology, Bibliometric analysis, Results and discussion, Threats to validity, Future work and Conclusion. This systematic organization provides a comprehensive review of the research, its impact and future directions, highlighting the importance of keeping up to date with advances in the field to increase the relevance of NER applications in healthcare.

摘要

背景

基于规则的系统、词嵌入（如Word2Vec、GloVe）等传统方法以及诸如条件随机场（CRFs）和隐马尔可夫模型（HMMs）等序列标注模型，在捕捉医学文本复杂而细微的上下文方面存在困难，导致精度较低且缺乏灵活性。这些方法还难以应对医学语言固有的变异性，并且通常需要大量难以获取的标注数据集。

目的

我们研究命名实体识别（NER）在医疗文本分析中日益增长的重要性。NER是自然语言处理（NLP）中的一项基础技术，可自动识别和分类文本中的命名实体，如医学文本中的人名、组织名、病症名和药物名。这有助于更好地进行信息检索、个性化医疗方法以及临床决策支持系统。

方法

进行了一项系统映射研究，重点关注先进的语言模型，特别是基于变换的模型，如BERT。这些模型以捕捉复杂的语义依赖关系和语言细微差别而闻名，这对于准确处理医学文本至关重要。与卷积神经网络（CNNs）和循环神经网络（RNNs）等传统技术不同，变换架构由于其处理长序列的能力和高精度需求而更适合处理医学文本的上下文和语义性质。

结果

结果表明，基于变换的模型，特别是BERT及其专门变体（如ClinicalBERT），在NER任务上始终表现出高性能，F1分数通常超过97%，优于传统方法和混合方法。在考察贡献的地理分布时，研究发现中国贡献显著，其次是美国。这些发现对将NER技术整合到巴西国家卫生系统（SUS）具有至关重要的意义。

结论

本系统综述通过评估方法、展示结果并强调该领域的更广泛影响，为健康文本中NER的发展做出了贡献。文章系统地分为以下几个部分：方法、文献计量分析、结果与讨论、效度威胁、未来工作和结论。这种系统的组织方式对该研究、其影响和未来方向进行了全面综述，突出了跟上该领域进展以提高NER在医疗保健应用中的相关性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d047/12277242/6b694008ee0b/frai-08-1584203-g001.jpg

相似文献

Artificial intelligence in healthcare text processing: a review applied to named entity recognition.医疗文本处理中的人工智能：应用于命名实体识别的综述

Front Artif Intell. 2025 Jul 7;8:1584203. doi: 10.3389/frai.2025.1584203. eCollection 2025.

Sexual Harassment and Prevention Training性骚扰与预防培训

Short-Term Memory Impairment短期记忆障碍

Knowledge Graph-Enhanced Deep Learning Model (H-SYSTEM) for Hypertensive Intracerebral Hemorrhage: Model Development and Validation.用于高血压性脑出血的知识图谱增强深度学习模型（H-SYSTEM）：模型开发与验证

J Med Internet Res. 2025 Jun 12;27:e66055. doi: 10.2196/66055.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性：来自电子健康记录的心血管诊断案例研究

JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.

Dynamic taxonomy generation for future skills identification using a named entity recognition and relation extraction pipeline.使用命名实体识别和关系提取管道生成动态分类法以识别未来技能。

Front Artif Intell. 2025 Jul 2;8:1579998. doi: 10.3389/frai.2025.1579998. eCollection 2025.

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.

Discontinuous named entities in clinical text: A systematic literature review.临床文本中的不连续命名实体：系统文献综述

J Biomed Inform. 2025 Feb;162:104783. doi: 10.1016/j.jbi.2025.104783. Epub 2025 Jan 23.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

本文引用的文献

Prompt Tuning in Biomedical Relation Extraction.生物医学关系抽取中的提示调优

J Healthc Inform Res. 2024 Feb 29;8(2):206-224. doi: 10.1007/s41666-024-00162-9. eCollection 2024 Jun.

Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models.使用大语言模型识别和提取罕见疾病及其表型

J Healthc Inform Res. 2024 Jan 5;8(2):438-461. doi: 10.1007/s41666-023-00155-0. eCollection 2024 Jun.

Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.基于领域特定的 ALBERT 进行生物医学自然语言处理任务的基准测试。

BMC Bioinformatics. 2022 Apr 21;23(1):144. doi: 10.1186/s12859-022-04688-w.

Multi-task learning for Chinese clinical named entity recognition with external knowledge.基于外部知识的多任务学习在中文临床命名实体识别中的应用。

BMC Med Inform Decis Mak. 2021 Dec 31;21(1):372. doi: 10.1186/s12911-021-01717-1.

Systematic review of current natural language processing methods and applications in cardiology.系统评价当前自然语言处理方法在心脏病学中的应用。

Heart. 2022 May 25;108(12):909-916. doi: 10.1136/heartjnl-2021-319769.

A Weakly-Supervised Named Entity Recognition Machine Learning Approach for Emergency Medical Services Clinical Audit.一种用于急诊医疗服务临床审核的弱监督命名实体识别机器学习方法。

Int J Environ Res Public Health. 2021 Jul 22;18(15):7776. doi: 10.3390/ijerph18157776.

The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review.人工智能对话代理在医疗保健中的有效性：系统评价

J Med Internet Res. 2020 Oct 22;22(10):e20346. doi: 10.2196/20346.

The financing of SUS in a scenario of financialization.金融化背景下单一健康系统（SUS）的融资

Cien Saude Colet. 2009 May-Jun;14(3):841-50. doi: 10.1590/s1413-81232009000300019.

The PICO strategy for the research question construction and evidence search.用于构建研究问题和检索证据的PICO策略。

Rev Lat Am Enfermagem. 2007 May-Jun;15(3):508-11. doi: 10.1590/s0104-11692007000300023.

The well-built clinical question: a key to evidence-based decisions.构建完善的临床问题：循证决策的关键。

ACP J Club. 1995 Nov-Dec;123(3):A12-3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

医疗文本处理中的人工智能：应用于命名实体识别的综述

Artificial intelligence in healthcare text processing: a review applied to named entity recognition.

作者信息

机构信息

出版信息

CONTEXT

OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献