Department of Physics, The University of Texas at Arlington, Arlington, TX 76019, United States of America.
Innovative Technology of Radiotherapy Computation and Hardware (iTORCH) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75287, United States of America.
Biomed Phys Eng Express. 2022 Jun 3;8(4). doi: 10.1088/2057-1976/ac6d82.
Although commercial treatment planning systems (TPSs) can automatically solve the optimization problem for treatment planning, human planners need to define and adjust the planning objectives/constraints to obtain clinically acceptable plans. Such a process is labor-intensive and time-consuming. In this work, we show an end-to-end study to train a deep reinforcement learning (DRL) based virtual treatment planner (VTP) that can behave like a human to operate a dose-volume constrained treatment plan optimization engine following the parameters used in Eclipse TPS for high-quality treatment planning. We considered the prostate cancer IMRT treatment plan as the testbed. The VTP took the dose-volume histogram (DVH) of a plan as input and predicted the optimal strategy for constraint adjustment to improve the plan quality. The training of VTP followed the state-of-the-art Q-learning framework. Experience replay was implemented with epsilon-greedy search to explore the impacts of taking different actions on a large number of automatically generated plans, from which an optimal policy can be learned. Since a major computational cost in training was to solve the plan optimization problem repeatedly, we implemented a graphical processing unit (GPU)-based technique to improve the efficiency by 2-fold. Upon the completion of training, the established VTP was deployed to plan for an independent set of 50 testing patient cases. Connecting the established VTP with the Eclipse workstation via the application programming interface, we tested the performance the VTP in operating Eclipse TPS for automatic treatment planning with another two independent patient cases. Like a human planner, VTP kept adjusting the planning objectives/constraints to improve plan quality until the plan was acceptable or the maximum number of adjustment steps was reached under both scenarios. The generated plans were evaluated using the ProKnow scoring system. The mean plan score (± standard deviation) of the 50 testing cases were improved from 6.18 ± 1.75 to 8.14 ± 1.27 by the VTP, with 9 being the maximal score. As for the two cases under Eclipse dose optimization, the plan scores were improved from 8 to 8.4 and 8.7 respectively by the VTP. These results indicated that the proposed DRL-based VTP was able to operate the in-house dose-volume constrained TPS and Eclipse TPS to automatically generate high-quality treatment plans for prostate cancer IMRT.
虽然商业治疗计划系统(TPS)可以自动解决治疗计划的优化问题,但人类规划师需要定义和调整规划目标/约束条件,以获得临床可接受的计划。这个过程既费时又费力。在这项工作中,我们展示了一项端到端的研究,训练一个基于深度强化学习(DRL)的虚拟治疗规划师(VTP),该规划师可以像人类一样操作剂量体积受限的治疗计划优化引擎,使用 Eclipse TPS 中的参数为高质量的治疗计划。我们考虑了前列腺癌调强放疗(IMRT)的治疗计划作为试验台。VTP 以计划的剂量体积直方图(DVH)为输入,并预测了约束调整的最佳策略,以提高计划质量。VTP 的训练遵循了最先进的 Q-learning 框架。经验回放通过ε贪婪搜索来实现,以探索在大量自动生成的计划上采取不同行动对计划的影响,从中可以学习到最佳策略。由于训练中的主要计算成本是反复解决计划优化问题,因此我们实现了基于图形处理单元(GPU)的技术,将效率提高了 2 倍。在训练完成后,我们将建立的 VTP 部署到 50 个独立测试患者病例的计划中。通过应用程序编程接口将建立的 VTP 与 Eclipse 工作站连接,我们测试了 VTP 在使用另两个独立患者病例为自动治疗计划操作 Eclipse TPS 时的性能。与人类规划师一样,VTP 不断调整规划目标/约束条件以提高计划质量,直到计划可接受或达到最大调整步骤数。使用 ProKnow 评分系统评估生成的计划。在两种情况下,50 个测试病例的平均计划评分(±标准差)从 6.18±1.75提高到 8.14±1.27,最高得分为 9 分。对于在 Eclipse 剂量优化下的两个病例,VTP 将计划评分分别从 8 提高到 8.4 和 8.7。这些结果表明,所提出的基于 DRL 的 VTP 能够操作内部剂量体积受限 TPS 和 Eclipse TPS,为前列腺癌 IMRT 自动生成高质量的治疗计划。