Mehan Namya, Desinghe Teshan Dias, Saha Ashirbani
Integrated Biomedical Engineering and Health Sciences, McMaster University, Hamilton, Ontario, Canada.
Global Health Program, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada.
PLOS Digit Health. 2025 Aug 7;4(8):e0000980. doi: 10.1371/journal.pdig.0000980. eCollection 2025 Aug.
Large language models (LLMs), a significant development in artificial intelligence (AI), are continuing to demonstrate seminal improvement in performance for various text analysis and generation tasks. There are limited systematic studies on LLM applications that were developed/evaluated in relevance to oncology. Our scoping review explores applications of LLMs in oncology to determine (1) the nature of LLM applications relevant to a cancer/tumor type, (2) the phases of cancer care addressed by the LLMs, (3) which LLMs were used in these applications, (4) the sources and pre-processing of datasets used, (5) the techniques used to optimize the performance of LLMs, (6) the methods of evaluation, and (7) the common limitations noted by the authors of these LLM applications and to study their implications in research and practice. A librarian-assisted search was performed across the following databases: Association for Computing Machinery (ACM), Embase, Engineering Village, IEEE Xplore, Medline, Scopus, SPIE and Web of Science till Jan 12, 2024. Pre-prints from this search were considered if they were published/accepted by Feb 29, 2024. From the initial search of 14863 articles, 60 were finally included. Our results demonstrated that LLMs were mostly evaluated across a diverse set of oncology-related applications. Generative pre-trained transformer (GPT)-based LLMs were mostly used. In the subset of studies where the phase(s) of cancer care was/were provided or implied, treatment and diagnosis were the most included phases. Data for development and evaluation extended from patient health records, synthetic patient records, research and professional society publications to social media. Prompt-designing and engineering were performed as data pre-processing steps in several studies. Clinicians, trainees, researchers, and patients were among the variety of users targeted by the applications. In the17% studies that developed LLMs for oncological aspects, domain adaptation through pre-training and fine-tuning were often performed and resulted in performance improvement. The evaluation of an LLM's performance involved usage of both standard, validated, non-standardized, and/or customized performance measures considering a variety of constructs, other than accuracy. Six primary themes emerged as limitations including limitation of generalizability/applicability, sample size, bias and subjectivity, and evaluation metrics. This review highlights that LLMs, specific to oncological aspects, are less common than general-purpose LLMs. The application areas were heterogeneous, used diverse data sources, were directed towards a variety of users, and resulted in variety of evaluation methods. Despite the diversity of LLM applications in oncology, future research needs to address the limited generalizability of these applications, mitigation of bias and subjectivity, and standardization of evaluation methodologies. Future applications of LLMs in oncology should include developing oncology-specific LLMs that can mitigate knowledge gaps and extend to diverse areas of oncology training and practice not considered so far.
大语言模型(LLMs)是人工智能(AI)领域的一项重大发展,在各种文本分析和生成任务中持续展现出开创性的性能提升。针对与肿瘤学相关而开发/评估的大语言模型应用的系统性研究有限。我们的范围综述探讨了大语言模型在肿瘤学中的应用,以确定:(1)与癌症/肿瘤类型相关的大语言模型应用的性质;(2)大语言模型所涉及的癌症护理阶段;(3)这些应用中使用的大语言模型;(4)所使用数据集的来源和预处理;(5)用于优化大语言模型性能的技术;(6)评估方法;以及(7)这些大语言模型应用的作者指出的常见局限性,并研究它们在研究和实践中的影响。在以下数据库中进行了图书馆员协助的检索:美国计算机协会(ACM)、Embase、工程索引(Engineering Village)、IEEE Xplore、医学索引(Medline)、Scopus、国际光学工程学会(SPIE)和科学引文索引(Web of Science),检索截至2024年1月12日。如果预印本在2024年2月29日前已发表/被接受,则纳入此次检索。从最初检索到的14863篇文章中,最终纳入了60篇。我们的结果表明,大语言模型大多在一系列不同的肿瘤学相关应用中得到评估。基于生成式预训练变换器(GPT)的大语言模型使用最为广泛。在提供或暗示了癌症护理阶段的研究子集中,治疗和诊断是最常涉及的阶段。用于开发和评估的数据范围从患者健康记录、合成患者记录、研究及专业协会出版物到社交媒体。在一些研究中,提示设计和工程作为数据预处理步骤进行。应用的目标用户包括临床医生、实习生、研究人员和患者等各类人群。在17%针对肿瘤学方面开发大语言模型的研究中,常通过预训练和微调进行领域适应,从而提高了性能。对大语言模型性能的评估涉及使用标准的、经过验证的、非标准化的和/或定制的性能指标,这些指标考虑了除准确性之外的各种结构。出现了六个主要的局限性主题,包括可推广性/适用性的局限性、样本量、偏差和主观性以及评估指标。本综述强调,针对肿瘤学方面的大语言模型比通用大语言模型少见。应用领域各异,使用了不同的数据来源,面向各类用户,且产生了多种评估方法。尽管大语言模型在肿瘤学中的应用具有多样性,但未来研究需要解决这些应用可推广性有限、偏差和主观性的缓解以及评估方法的标准化等问题。大语言模型在肿瘤学中的未来应用应包括开发特定于肿瘤学的大语言模型,以弥补知识差距,并扩展到目前尚未考虑的肿瘤学培训和实践的不同领域。