Bilal Muhammad, Hamza Ameer, Malik Nadia
Department of Pharmaceutical Outcomes and Policy (M.B.), University of Florida, Gainesville, Florida, USA; Department of Software Engineering (M.B.), National University of Computer and Emerging Sciences, Islamabad, Pakistan.
Department of Computer Science (A.H.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan.
J Pain Symptom Manage. 2025 May;69(5):e374-e394. doi: 10.1016/j.jpainsymman.2025.01.019. Epub 2025 Jan 31.
This review examines the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. It addresses gaps in existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. A comprehensive literature search in the Scopus database identified 94 relevant studies published between 2019 and 2024. The analysis revealed a growing trend in NLP applications for cancer research, with information extraction (47 studies) and text classification (40 studies) emerging as predominant NLP tasks, followed by named entity recognition (7 studies). Among cancer types, breast, lung, and colorectal cancers were found to be the most studied. A significant shift from rule-based and traditional machine learning approaches to advanced deep learning techniques and transformer-based models was observed. It was found that dataset sizes used in existing studies varied widely, ranging from small, manually annotated datasets to large-scale EHRs. The review highlighted key challenges, including the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. While NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. The integration of NLP tools into palliative medicine and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes. This review provides valuable insights into the current state and future directions of NLP applications in cancer research.
本综述探讨了自然语言处理(NLP)技术在利用电子健康记录(EHR)和临床笔记进行癌症研究中的应用。它通过提供比以往专注于特定癌症类型或应用的研究更广泛的视角,弥补了现有文献中的空白。在Scopus数据库中进行的全面文献检索确定了2019年至2024年间发表的94项相关研究。分析显示,NLP在癌症研究中的应用呈增长趋势,信息提取(47项研究)和文本分类(40项研究)成为主要的NLP任务,其次是命名实体识别(7项研究)。在癌症类型中,乳腺癌、肺癌和结直肠癌的研究最多。观察到从基于规则和传统机器学习方法到先进深度学习技术和基于Transformer的模型的显著转变。研究发现,现有研究中使用的数据集大小差异很大,从小规模的人工标注数据集到大规模的电子健康记录不等。该综述强调了关键挑战,包括所提出解决方案的泛化性有限以及需要更好地整合到临床工作流程中。虽然NLP技术在分析电子健康记录和临床笔记以进行癌症研究方面显示出巨大潜力,但未来的工作应集中在提高模型的泛化性、增强处理复杂临床语言的鲁棒性以及将应用扩展到研究较少的癌症类型上。将NLP工具整合到姑息治疗中并解决伦理问题对于充分发挥NLP在改善癌症诊断、治疗和患者预后方面的潜力仍然至关重要。本综述为NLP在癌症研究中的当前状态和未来方向提供了有价值的见解。