Wei Shuoyang, Hu Ankang, Liang Yongguang, Yang Jingru, Yu Lang, Li Wenbo, Yang Bo, Qiu Jie
Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China.
Department of Engineering Physics, Tsinghua University, Beijing, 100084, China.
Radiat Oncol. 2025 May 15;20(1):77. doi: 10.1186/s13014-025-02660-5.
Radiotherapy treatment planning traditionally involves complex and time-consuming processes, often relying on trial-and-error methods. The emergence of artificial intelligence, particularly Large Language Models (LLMs), surpassing human capabilities and existing algorithms in various domains, presents an opportunity to automate and enhance this optimization process.
This study seeks to evaluate the capacity of LLMs to generate radiotherapy treatment plans comparable to those crafted by human medical physicists, focusing on target volume conformity and organs-at-risk (OARs) dose sparing. The goal is to automate the optimization process of radiotherapy treatment plans through the utilization of LLMs.
Multiple LLMs were employed to adjust optimization parameters for radiotherapy treatment plans, using a dataset comprising 35 cervical cancer patients treated with volumetric modulated arc therapy (VMAT). Customized prompts were applied to 5 patients to tailor the LLMs, which were subsequently tested on 30 patients. Evaluation metrics included target volume conformity, dose homogeneity, monitor units (MU) value, and OARs dose sparing, comparing plans generated by various LLMs to manual plans.
With the exception of Gemini-1.5-flash, which faced challenges due to hallucinations, Qwen-2.5-max and Llama-3.2 produced acceptable VMAT plans in 16.3 ± 5.0 and 9.8 ± 2.1 min, respectively, outperforming an experienced human physicist's time cost of about 20 min. The average conformity index (CI) for Qwen-2.5-max plans, Llama-3.2 plans, and manual plans on the test set were 0.929 ± 0.007, 0.928 ± 0.007, and 0.926 ± 0.007, respectively. The average homogeneity index (HI) was 0.058 ± 0.006, 0.059 ± 0.005, and 0.065 ± 0.006, respectively. While there was a significant difference in target volume conformity between LLM plans and manual plans, OARs dose sparing showed no significant variations. In lateral comparisons among different LLMs, no statistically significant differences were observed in the PTV dose, OARs dose sparing, and target volume conformity between Qwen-2.5-max and Llama-3.2 plans.
Through an assessment of LLM-generated plans and clinical plans in terms of target volume conformity and OARs dose sparing, this study provides preliminary evidence supporting the viability of LLMs for optimizing radiotherapy treatment plans. The implementation of LLMs demonstrates the potential for enhancing clinical workflows and reducing the workload associated with treatment planning.
传统上,放射治疗计划涉及复杂且耗时的过程,通常依赖反复试验的方法。人工智能的出现,特别是大语言模型(LLMs),在各个领域超越了人类能力和现有算法,为自动化和改进这一优化过程提供了契机。
本研究旨在评估大语言模型生成与人类医学物理师制定的放射治疗计划相当的计划的能力,重点关注靶区适形性和危及器官(OARs)的剂量 sparing。目标是通过利用大语言模型实现放射治疗计划优化过程的自动化。
使用多个大语言模型来调整放射治疗计划的优化参数,数据集包括35例接受容积调强弧形放疗(VMAT)的宫颈癌患者。对5例患者应用定制提示来定制大语言模型,随后在30例患者上进行测试。评估指标包括靶区适形性、剂量均匀性、监测单位(MU)值和OARs剂量 sparing,将不同大语言模型生成的计划与手动计划进行比较。
除Gemini - 1.5 - flash因幻觉面临挑战外,Qwen - 2.5 - max和Llama - 3.2分别在16.3±5.0分钟和9.8±2.1分钟内生成了可接受的VMAT计划,优于经验丰富的人类物理师约20分钟的时间成本。测试集上Qwen - 2.5 - max计划、Llama - 3.2计划和手动计划的平均适形指数(CI)分别为0.929±0.007、0.928±0.007和0.926±0.007。平均均匀性指数(HI)分别为0.058±0.006、0.059±0.005和0.065±0.006。虽然大语言模型计划与手动计划在靶区适形性上存在显著差异,但OARs剂量 sparing无显著差异。在不同大语言模型的横向比较中,Qwen - 2.5 - max和Llama - 3.2计划在计划靶体积(PTV)剂量、OARs剂量 sparing和靶区适形性方面未观察到统计学显著差异。
通过在靶区适形性和OARs剂量 sparing方面对大语言模型生成的计划和临床计划进行评估,本研究提供了初步证据支持大语言模型用于优化放射治疗计划的可行性。大语言模型的应用展示了改善临床工作流程和减少与治疗计划相关工作量的潜力。