Zhang Jiahan, Wang Chunhao, Sheng Yang, Palta Manisha, Czito Brian, Willett Christopher, Zhang Jiang, Jensen P James, Yin Fang-Fang, Wu Qiuwen, Ge Yaorong, Wu Q Jackie
Department of Radiation Oncology, Duke University Medical Center, Durham North Carolina.
Department of Radiation Oncology, Duke University Medical Center, Durham North Carolina.
Int J Radiat Oncol Biol Phys. 2021 Mar 15;109(4):1076-1085. doi: 10.1016/j.ijrobp.2020.10.019. Epub 2020 Oct 25.
Pancreas stereotactic body radiation therapy (SBRT) treatment planning requires planners to make sequential, time-consuming interactions with the treatment planning system to reach the optimal dose distribution. We sought to develop a reinforcement learning (RL)-based planning bot to systematically address complex tradeoffs and achieve high plan quality consistently and efficiently.
The focus of pancreas SBRT planning is finding a balance between organ-at-risk sparing and planning target volume (PTV) coverage. Planners evaluate dose distributions and make planning adjustments to optimize PTV coverage while adhering to organ-at-risk dose constraints. We formulated such interactions between the planner and treatment planning system into a finite-horizon RL model. First, planning status features were evaluated based on human planners' experience and defined as planning states. Second, planning actions were defined to represent steps that planners would commonly implement to address different planning needs. Finally, we derived a reward system based on an objective function guided by physician-assigned constraints. The planning bot trained itself with 48 plans augmented from 16 previously treated patients, and generated plans for 24 cases in a separate validation set.
All 24 bot-generated plans achieved similar PTV coverages compared with clinical plans while satisfying all clinical planning constraints. Moreover, the knowledge learned by the bot could be visualized and interpreted as consistent with human planning knowledge, and the knowledge maps learned in separate training sessions were consistent, indicating reproducibility of the learning process.
We developed a planning bot that generates high-quality treatment plans for pancreas SBRT. We demonstrated that the training phase of the bot is tractable and reproducible, and the knowledge acquired is interpretable. As a result, the RL planning bot can potentially be incorporated into the clinical workflow and reduce planning inefficiencies.
胰腺立体定向体部放射治疗(SBRT)治疗计划要求计划制定者与治疗计划系统进行一系列耗时的交互,以达到最佳剂量分布。我们试图开发一种基于强化学习(RL)的计划机器人,以系统地解决复杂的权衡问题,并始终如一地高效实现高质量计划。
胰腺SBRT计划的重点是在危及器官的 sparing 和计划靶区(PTV)覆盖之间找到平衡。计划制定者评估剂量分布并进行计划调整,以优化PTV覆盖,同时遵守危及器官的剂量限制。我们将计划制定者与治疗计划系统之间的这种交互制定为有限期RL模型。首先,根据人类计划制定者的经验评估计划状态特征,并将其定义为计划状态。其次,定义计划行动以表示计划制定者通常会采取的步骤,以满足不同的计划需求。最后,我们基于由医生指定的约束指导的目标函数导出了奖励系统。计划机器人使用从16名先前治疗的患者中增加的48个计划进行自我训练,并在单独的验证集中为24个病例生成计划。
与临床计划相比,所有24个由机器人生成的计划在满足所有临床计划约束的同时,实现了相似的PTV覆盖。此外,机器人学到的知识可以可视化并解释为与人类计划知识一致,并且在单独的训练课程中学到的知识图谱是一致的,表明学习过程具有可重复性。
我们开发了一种计划机器人,可为胰腺SBRT生成高质量的治疗计划。我们证明了机器人的训练阶段是易于处理且可重复的,并且所获得的知识是可解释的。因此,RL计划机器人有可能被纳入临床工作流程并减少计划效率低下的问题。