Natural language processing in medical text processing: A scoping literature review.

作者信息

Elvas Luis B, Almeida Ana, Ferreira João C

机构信息

Department of Logistics, Molde University College, Molde 6410, Norway; Inov Inesc Inovação - Instituto de Novas Tecnologias, 1000-029 Lisbon, Portugal; Breast Cancer Research Program, Champalimaud Foundation, Lisbon, Portugal; ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisbon, Portugal.

ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisbon, Portugal.

出版信息

Int J Med Inform. 2025 Dec;204:106049. doi: 10.1016/j.ijmedinf.2025.106049. Epub 2025 Jul 17.

BACKGROUND

The exponential growth of digitized medical data has created significant challenges for healthcare professionals, as medical documentation transitions from simple text records to complex, multi-dimensional data structures. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has emerged as a crucial tool for extracting and categorizing critical information from clinical texts. The development of transformer-based models like BERT and the ability to fine-tune pre-trained AI models have revolutionized the field, offering unprecedented opportunities to enhance the efficient and precise interpretation of medical data across diverse languages and healthcare contexts.

OBJECTIVE

This literature review aimed to analyze recent NLP approaches for medical text processing, examining techniques, performance metrics, and advancements across different languages and healthcare contexts.

METHOD

Following the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) methodology, a scoping search was conducted in Scopus and PubMed databases, focusing on studies published between 2019-2024. The review included studies on language model fine-tuning and information extraction in healthcare, with a specific search query designed to capture relevant NLP techniques.

RESULTS

Of 67 initial records, 31 studies were ultimately included. Bidirectional Encoder Representations from Transformers (BERT)-based approaches, neural networks, and CRF/LSTM techniques dominated, consistently achieving F1-scores above 85 %. The studies covered multiple languages, with 51.5 % in English, 27.3 % in Chinese, and smaller representations in Italian, German, and Spanish. Hybrid approaches and techniques addressing data privacy and limited labeled data were notably prevalent.

CONCLUSIONS

The review revealed that modern NLP techniques, particularly BERT-based models and hybrid approaches, show significant promise in medical text processing across different languages. While challenges remain in cross-lingual adaptation and data availability, these technologies demonstrate potential to enhance medical data interpretation and analysis.