• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于策略梯度的深度强化学习实现头颈癌质子笔形束扫描治疗计划的自动化优化。

Automating the optimization of proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning.

作者信息

Wang Qingqing, Chang Chang

机构信息

Department of Radiation Medicine and Applied Sciences, University of California at San Diego, La Jolla, California, USA.

California Protons Cancer Therapy Center, San Diego, California, USA.

出版信息

Med Phys. 2025 Apr;52(4):1997-2014. doi: 10.1002/mp.17654. Epub 2025 Jan 31.

DOI:10.1002/mp.17654
PMID:39887764
Abstract

BACKGROUND

Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of potentially conflicting planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy (IMRT) and brachytherapy for prostate, lung, and cervical cancers. However, existing DRL planning models are built upon the Q-learning framework and rely on weighted linear combinations of clinical metrics for reward calculation. These approaches suffer from poor scalability and flexibility, that is, they are only capable of adjusting a limited number of planning objectives in discrete action spaces and therefore fail to generalize to more complex planning problems.

PURPOSE

Here we propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm in the policy gradient framework of DRL and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers.

METHODS

The planning process is formulated as an optimization problem. A set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. Special attention is given to overlapping structures with potentially conflicting objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters. The policy network predicts actions in a continuous action space and guides the treatment planning system to refine the PBS treatment plans using a novel dose distribution-based reward function. A total of 34 H&N patients (30 for training and 4 for test) and 26 liver patients (20 for training, 6 for test) are included in this study to train and verify the effectiveness and generalizability of the proposed method.

RESULTS

Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites.

CONCLUSIONS

The automatic treatment planning model can generate complex H&N plans with quality comparable or superior to those produced by experienced human planners. Compared with existing works, our method is capable of handling more planning objectives in continuous action spaces. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.

摘要

背景

头颈部(H&N)癌的质子笔形束扫描(PBS)治疗计划是一项耗时且需要经验的任务,其中涉及大量潜在冲突的计划目标。深度强化学习(DRL)最近已被引入到前列腺癌、肺癌和宫颈癌的调强放射治疗(IMRT)及近距离放射治疗的计划过程中。然而,现有的DRL计划模型是基于Q学习框架构建的,并且在奖励计算中依赖于临床指标的加权线性组合。这些方法存在扩展性和灵活性差的问题,即它们仅能够在离散动作空间中调整有限数量的计划目标,因此无法推广到更复杂的计划问题。

目的

在此,我们提出一种在DRL的策略梯度框架中使用近端策略优化(PPO)算法的自动治疗计划模型,以及一种基于剂量分布的奖励函数,用于H&N癌的质子PBS治疗计划。

方法

将计划过程表述为一个优化问题。使用一组经验规则从靶区体积和危及器官(OAR)创建辅助计划结构,以及它们相关的计划目标。特别关注具有潜在冲突目标的重叠结构。将这些计划目标输入到一个内部优化引擎中以生成光斑监测单位(MU)值。开发一个使用PPO训练的决策策略网络,以迭代地调整所涉及的计划目标参数。该策略网络在连续动作空间中预测动作,并使用一种基于剂量分布的新型奖励函数指导治疗计划系统优化PBS治疗计划。本研究共纳入34名头颈部患者(30例用于训练,4例用于测试)和26例肝癌患者(20例用于训练,6例用于测试),以训练和验证所提出方法的有效性和通用性。

结果

与人工生成的计划相比,该模型生成的确质子头颈部治疗计划在靶区覆盖相同或更好的情况下,对OAR的保护得到了改善。此外,对肝癌的额外实验表明,所提出的方法可以成功推广到其他治疗部位。

结论

该自动治疗计划模型可以生成复杂的头颈部计划,其质量与经验丰富的人工计划者产生的计划相当或更优。与现有工作相比,我们的方法能够在连续动作空间中处理更多的计划目标。据我们所知,这是首个能够实现头颈部癌人类水平性能的确基于DRL的自动治疗计划模型。

相似文献

1
Automating the optimization of proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning.使用基于策略梯度的深度强化学习实现头颈癌质子笔形束扫描治疗计划的自动化优化。
Med Phys. 2025 Apr;52(4):1997-2014. doi: 10.1002/mp.17654. Epub 2025 Jan 31.
2
Automated treatment planning with deep reinforcement learning for head-and-neck (HN) cancer intensity modulated radiation therapy (IMRT).用于头颈(HN)癌调强放射治疗(IMRT)的基于深度强化学习的自动治疗计划
Phys Med Biol. 2024 Dec 24;70(1). doi: 10.1088/1361-6560/ad965d.
3
Reinforcement learning-driven automated head and neck simultaneous integrated boost (SIB) radiation therapy: flexible treatment planning aligned with clinical preferences.强化学习驱动的头颈同步整合加量(SIB)放射治疗:与临床偏好相一致的灵活治疗计划
Phys Med Biol. 2025 Apr 22;70(8). doi: 10.1088/1361-6560/adcb84.
4
Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy.通过知识引导的深度强化学习提高虚拟治疗计划网络的训练效率,用于放射治疗的智能自动治疗计划。
Med Phys. 2021 Apr;48(4):1909-1920. doi: 10.1002/mp.14712. Epub 2021 Feb 16.
5
Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer.通过深度强化学习实现智能反演治疗计划,宫颈癌高剂量率近距离放疗的原理验证研究。
Phys Med Biol. 2019 May 29;64(11):115013. doi: 10.1088/1361-6560/ab18bf.
6
The development of a deep reinforcement learning network for dose-volume-constrained treatment planning in prostate cancer intensity modulated radiotherapy.开发一个用于前列腺癌调强放射治疗中剂量体积约束治疗计划的深度强化学习网络。
Biomed Phys Eng Express. 2022 Jun 3;8(4). doi: 10.1088/2057-1976/ac6d82.
7
A Novel Dose Rate Optimization Method to Maximize Ultrahigh-Dose-Rate Coverage of Critical Organs at Risk Without Compromising Dosimetry Metrics in Proton Pencil Beam Scanning FLASH Radiation Therapy.一种新的剂量率优化方法,可在不影响质子铅笔束扫描 FLASH 放疗中剂量学指标的情况下,最大限度地提高危及器官的超高剂量覆盖率。
Int J Radiat Oncol Biol Phys. 2024 Nov 15;120(4):1181-1191. doi: 10.1016/j.ijrobp.2024.06.002. Epub 2024 Jun 14.
8
A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy.一种用于前列腺癌调强放射治疗智能自动计划制定的分层深度强化学习框架。
Phys Med Biol. 2021 Jun 23;66(13). doi: 10.1088/1361-6560/ac09a2.
9
Validation of automated complex head and neck treatment planning with pencil beam scanning proton therapy.笔形束扫描质子治疗的自动化复杂头部和颈部治疗计划的验证。
J Appl Clin Med Phys. 2022 Feb;23(2):e13510. doi: 10.1002/acm2.13510. Epub 2021 Dec 22.
10
Toward automatic beam angle selection for pencil-beam scanning proton liver treatments: A deep learning-based approach.基于深度学习的铅笔束扫描质子肝脏治疗中自动束角选择方法。
Med Phys. 2022 Jul;49(7):4293-4304. doi: 10.1002/mp.15676. Epub 2022 May 11.

引用本文的文献

1
Patient-Specific Deep Reinforcement Learning for Automatic Replanning in Head-and-Neck Cancer Proton Therapy.用于头颈癌质子治疗自动再计划的患者特异性深度强化学习
ArXiv. 2025 Aug 11:arXiv:2506.10073v2.