使用基于策略梯度的深度强化学习实现头颈癌质子笔形束扫描治疗计划的自动化优化。

Automating the optimization of proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning.

作者信息

Wang Qingqing, Chang Chang

机构信息

Department of Radiation Medicine and Applied Sciences, University of California at San Diego, La Jolla, California, USA.

California Protons Cancer Therapy Center, San Diego, California, USA.

出版信息

Med Phys. 2025 Apr;52(4):1997-2014. doi: 10.1002/mp.17654. Epub 2025 Jan 31.

BACKGROUND

Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of potentially conflicting planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy (IMRT) and brachytherapy for prostate, lung, and cervical cancers. However, existing DRL planning models are built upon the Q-learning framework and rely on weighted linear combinations of clinical metrics for reward calculation. These approaches suffer from poor scalability and flexibility, that is, they are only capable of adjusting a limited number of planning objectives in discrete action spaces and therefore fail to generalize to more complex planning problems.

PURPOSE

Here we propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm in the policy gradient framework of DRL and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers.

METHODS

The planning process is formulated as an optimization problem. A set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. Special attention is given to overlapping structures with potentially conflicting objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters. The policy network predicts actions in a continuous action space and guides the treatment planning system to refine the PBS treatment plans using a novel dose distribution-based reward function. A total of 34 H&N patients (30 for training and 4 for test) and 26 liver patients (20 for training, 6 for test) are included in this study to train and verify the effectiveness and generalizability of the proposed method.

RESULTS

Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites.

CONCLUSIONS

The automatic treatment planning model can generate complex H&N plans with quality comparable or superior to those produced by experienced human planners. Compared with existing works, our method is capable of handling more planning objectives in continuous action spaces. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.

背景

头颈部（H&N）癌的质子笔形束扫描（PBS）治疗计划是一项耗时且需要经验的任务，其中涉及大量潜在冲突的计划目标。深度强化学习（DRL）最近已被引入到前列腺癌、肺癌和宫颈癌的调强放射治疗（IMRT）及近距离放射治疗的计划过程中。然而，现有的DRL计划模型是基于Q学习框架构建的，并且在奖励计算中依赖于临床指标的加权线性组合。这些方法存在扩展性和灵活性差的问题，即它们仅能够在离散动作空间中调整有限数量的计划目标，因此无法推广到更复杂的计划问题。

目的

在此，我们提出一种在DRL的策略梯度框架中使用近端策略优化（PPO）算法的自动治疗计划模型，以及一种基于剂量分布的奖励函数，用于H&N癌的质子PBS治疗计划。

方法

将计划过程表述为一个优化问题。使用一组经验规则从靶区体积和危及器官（OAR）创建辅助计划结构，以及它们相关的计划目标。特别关注具有潜在冲突目标的重叠结构。将这些计划目标输入到一个内部优化引擎中以生成光斑监测单位（MU）值。开发一个使用PPO训练的决策策略网络，以迭代地调整所涉及的计划目标参数。该策略网络在连续动作空间中预测动作，并使用一种基于剂量分布的新型奖励函数指导治疗计划系统优化PBS治疗计划。本研究共纳入34名头颈部患者（30例用于训练，4例用于测试）和26例肝癌患者（20例用于训练，6例用于测试），以训练和验证所提出方法的有效性和通用性。

结果

与人工生成的计划相比，该模型生成的确质子头颈部治疗计划在靶区覆盖相同或更好的情况下，对OAR的保护得到了改善。此外，对肝癌的额外实验表明，所提出的方法可以成功推广到其他治疗部位。

结论

该自动治疗计划模型可以生成复杂的头颈部计划，其质量与经验丰富的人工计划者产生的计划相当或更优。与现有工作相比，我们的方法能够在连续动作空间中处理更多的计划目标。据我们所知，这是首个能够实现头颈部癌人类水平性能的确基于DRL的自动治疗计划模型。

Automating the optimization of proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献