Innovative Technology Of Radiotherapy Computation and Hardware (iTORCH) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75287, United States of America. Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75287, United States of America.
Phys Med Biol. 2019 May 29;64(11):115013. doi: 10.1088/1361-6560/ab18bf.
Inverse treatment planning in radiation therapy is formulated as solving optimization problems. The objective function and constraints consist of multiple terms designed for different clinical and practical considerations. Weighting factors of these terms are needed to define the optimization problem. While a treatment planning optimization engine can solve the optimization problem with given weights, adjusting the weights to yield a high-quality plan is typically performed by a human planner. Yet the weight-tuning task is labor intensive, time consuming, and it critically affects the final plan quality. An automatic weight-tuning approach is strongly desired. The procedure of weight adjustment to improve the plan quality is essentially a decision-making problem. Motivated by the tremendous success in deep learning for decision making with human-level intelligence, we propose a novel framework to adjust the weights in a human-like manner. This study used inverse treatment planning in high-dose-rate brachytherapy (HDRBT) for cervical cancer as an example. We developed a weight-tuning policy network (WTPN) that observes dose volume histograms of a plan and outputs an action to adjust organ weighting factors, similar to the behaviors of a human planner. We trained the WTPN via end-to-end deep reinforcement learning. Experience replay was performed with the epsilon greedy algorithm. After training was completed, we applied the trained WTPN to guide treatment planning of five testing patient cases. It was found that the trained WTPN successfully learnt the treatment planning goals and was able to guide the weight tuning process. On average, the quality score of plans generated under the WTPN's guidance was improved by ~8.5% compared to the initial plan with arbitrarily set weights, and by 10.7% compared to the plans generated by human planners. To our knowledge, this was the first time that a tool was developed to adjust organ weights for the treatment planning optimization problem in a human-like fashion based on intelligence learnt from a training process, which was different from existing strategies based on pre-defined rules. The study demonstrated potential feasibility to develop intelligent treatment planning approaches via deep reinforcement learning.
放射治疗中的逆治疗计划被制定为解决优化问题。目标函数和约束条件由多个项组成,这些项是为了不同的临床和实际考虑而设计的。这些项的权重因子需要用来定义优化问题。虽然治疗计划优化引擎可以用给定的权重来解决优化问题,但调整权重以产生高质量的计划通常是由人类规划师来完成的。然而,权重调整任务既费力又费时,而且对最终计划质量有很大影响。因此,非常需要一种自动的权重调整方法。调整权重以提高计划质量的过程本质上是一个决策问题。受深度学习在人类智能决策方面取得巨大成功的启发,我们提出了一种新的框架,以类似于人类规划师的方式调整权重。本研究以宫颈癌高剂量率近距离治疗(HDRBT)中的逆治疗计划为例。我们开发了一种权重调整策略网络(WTPN),它观察计划的剂量体积直方图,并输出一个调整器官权重因子的动作,类似于人类规划师的行为。我们通过端到端的深度强化学习来训练 WTPN。使用 epsilon 贪婪算法进行经验回放。训练完成后,我们将训练好的 WTPN 应用于指导五个测试患者病例的治疗计划。结果发现,训练好的 WTPN 成功地学习了治疗计划的目标,并能够指导权重调整过程。平均而言,与任意设置权重的初始计划相比,WTPN 指导下生成的计划的质量评分提高了约 8.5%,与人类规划师生成的计划相比提高了 10.7%。据我们所知,这是第一次开发一种工具,根据从训练过程中学习到的智能,以类似于人类的方式调整器官权重,用于治疗计划优化问题,这与基于预定义规则的现有策略不同。该研究证明了通过深度强化学习开发智能治疗计划方法的潜在可行性。