Chen David, Avison Kate, Alnassar Saif, Huang Ryan S, Raman Srinivas
Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada.
Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada.
Oncologist. 2025 Apr 4;30(4). doi: 10.1093/oncolo/oyaf038.
Recent advances in large language models (LLM) have enabled human-like qualities of natural language competency. Applied to oncology, LLMs have been proposed to serve as an information resource and interpret vast amounts of data as a clinical decision-support tool to improve clinical outcomes.
This review aims to describe the current status of medical accuracy of oncology-related LLM applications and research trends for further areas of investigation.
A scoping literature search was conducted on Ovid Medline for peer-reviewed studies published since 2000. We included primary research studies that evaluated the medical accuracy of a large language model applied in oncology settings. Study characteristics and primary outcomes of included studies were extracted to describe the landscape of oncology-related LLMs.
Sixty studies were included based on the inclusion and exclusion criteria. The majority of studies evaluated LLMs in oncology as a health information resource in question-answer style examinations (48%), followed by diagnosis (20%) and management (17%). The number of studies that evaluated the utility of fine-tuning and prompt-engineering LLMs increased over time from 2022 to 2024. Studies reported the advantages of LLMs as an accurate information resource, reduction of clinician workload, and improved accessibility and readability of clinical information, while noting disadvantages such as poor reliability, hallucinations, and need for clinician oversight.
There exists significant interest in the application of LLMs in clinical oncology, with a particular focus as a medical information resource and clinical decision support tool. However, further research is needed to validate these tools in external hold-out datasets for generalizability and to improve medical accuracy across diverse clinical scenarios, underscoring the need for clinician supervision of these tools.
大语言模型(LLM)的最新进展已具备类似人类的自然语言能力。应用于肿瘤学领域时,大语言模型被提议作为一种信息资源,并将大量数据解释为临床决策支持工具,以改善临床结果。
本综述旨在描述肿瘤学相关大语言模型应用的医学准确性现状以及进一步研究领域的趋势。
在Ovid Medline上进行了范围界定文献检索,以查找2000年以来发表的同行评审研究。我们纳入了评估应用于肿瘤学环境中的大语言模型医学准确性的初步研究。提取纳入研究的特征和主要结果,以描述肿瘤学相关大语言模型的情况。
根据纳入和排除标准,共纳入60项研究。大多数研究将肿瘤学中的大语言模型评估为问答式考试中的健康信息资源(48%),其次是诊断(20%)和管理(17%)。从2022年到2024年,评估大语言模型微调及提示工程效用的研究数量随时间增加。研究报告了大语言模型作为准确信息资源的优势、临床医生工作量的减少以及临床信息可及性和可读性的提高,同时也指出了可靠性差、幻觉以及需要临床医生监督等缺点。
人们对大语言模型在临床肿瘤学中的应用有着浓厚兴趣,尤其关注其作为医学信息资源和临床决策支持工具。然而,需要进一步研究以在外部保留数据集上验证这些工具的可推广性,并提高其在不同临床场景下的医学准确性,这凸显了临床医生对这些工具进行监督的必要性。