Madondo Malvern, Shao Yuan, Liu Yingzi, Zhou Jun, Yang Xiaofeng, Tian Zhen
Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL, USA.
Division of Environmental and Occupational Health Sciences, University of Illinois at Chicago, Chicago, IL, USA.
ArXiv. 2025 Aug 11:arXiv:2506.10073v2.
Anatomical changes in head-and-neck cancer (HNC) patients during intensity-modulated proton therapy (IMPT) can shift the Bragg Peak of proton beams, risking tumor underdosing and organ-at-risk (OAR) overdosing. As a result, treatment replanning is often required to maintain clinically acceptable treatment quality. However, current manual replanning processes are often resource intensive and time consuming. In this work, we propose a patient-specific deep reinforcement learning (DRL) framework for automated IMPT replanning, with a reward-shaping mechanism based on a 150-point plan quality score designed to handle competing clinical objectives in radiotherapy planning. We formulate the planning process as a reinforcement learning (RL) problem where agents learn high-dimensional control policies to adjust plan optimization priorities to maximize plan quality. Unlike population-based approaches, our framework trains personalized agents for each patient using their planning Computed Tomography (CT) and augmented anatomies simulating anatomical changes (tumor progression and regression). This patient-specific approach leverages anatomical similarities along the treatment course, enabling effective plan adaptation. We implemented and compared two DRL algorithms, Deep Q-Network (DQN) and Proximal Policy Optimization (PPO), using dose-volume histograms (DVHs) as state representations and a 22-dimensional action space of priority adjustments. Evaluation on eight HNC patients using actual replanning CT data showed that both DRL agents improved initial plan scores from 120.78 ± 17.18 to 139.59 ± 5.50 (DQN) and 141.50 ± 4.69 (PPO), surpassing the replans manually generated by a human planner (136.32±4.79). Further comparison of dosimetric endpoints confirms these improvements translate to better tumor coverage and OAR sparing across diverse anatomical changes. This work highlights the potential of DRL in addressing the geometric and dosimetric complexities of adaptive proton therapy, offering a promising solution for efficient offline adaptation and paving the way for online adaptive proton therapy.
头颈部癌(HNC)患者在调强质子治疗(IMPT)期间的解剖学变化会使质子束的布拉格峰发生偏移,存在肿瘤剂量不足和危及器官(OAR)剂量过量的风险。因此,通常需要重新进行治疗计划以维持临床可接受的治疗质量。然而,当前的手动重新规划过程通常资源密集且耗时。在这项工作中,我们提出了一种针对自动IMPT重新规划的患者特异性深度强化学习(DRL)框架,该框架具有基于150分计划质量评分的奖励塑造机制,旨在处理放射治疗计划中相互竞争的临床目标。我们将规划过程制定为强化学习(RL)问题,其中智能体学习高维控制策略以调整计划优化优先级,从而最大化计划质量。与基于群体的方法不同,我们的框架使用每个患者的规划计算机断层扫描(CT)和模拟解剖学变化(肿瘤进展和消退)的增强解剖结构来训练个性化智能体。这种针对患者的方法利用了治疗过程中的解剖学相似性,实现了有效的计划调整。我们使用剂量体积直方图(DVH)作为状态表示以及22维优先级调整动作空间,实现并比较了两种DRL算法,即深度Q网络(DQN)和近端策略优化(PPO)。使用实际重新规划CT数据对8名头颈部癌患者进行的评估表明,两种DRL智能体均将初始计划评分从120.78±17.18提高到了139.59±5.50(DQN)和141.50±4.69(PPO),超过了人类计划者手动生成的重新计划(136.32±4.79)。对剂量学终点的进一步比较证实,这些改进转化为在各种解剖学变化中更好的肿瘤覆盖和OAR保护。这项工作突出了DRL在解决自适应质子治疗的几何和剂量学复杂性方面的潜力,为高效的离线调整提供了一个有前景的解决方案,并为在线自适应质子治疗铺平了道路。