Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA.
Department of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan.
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2).
In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL.
Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose escalation/de-escalation between 1.5 and 3.8 Gy, a range similar to that used in the clinical protocol. The same DQN yielded two patterns of dose escalation for the 34 test patients, but with different reward variants. First, using the baseline P+ reward function, individual adaptive fraction doses of the DQN had similar tendencies to the clinical data with an RMSE = 0.76 Gy; but adaptations suggested by the DQN were generally lower in magnitude (less aggressive). Second, by adjusting the P+ reward function with higher emphasis on mitigating local failure, better matching of doses between the DQN and the clinical protocol was achieved with an RMSE = 0.5 Gy. Moreover, the decisions selected by the DQN seemed to have better concordance with patients eventual outcomes. In comparison, the traditional temporal difference (TD) algorithm for reinforcement learning yielded an RMSE = 3.3 Gy due to numerical instabilities and lack of sufficient learning.
We demonstrated that automated dose adaptation by DRL is a feasible and a promising approach for achieving similar results to those chosen by clinicians. The process may require customization of the reward function if individual cases were to be considered. However, development of this framework into a fully credible autonomous system for clinical decision support would require further validation on larger multi-institutional datasets.
研究基于历史治疗计划的深度强化学习(DRL),以开发用于非小细胞肺癌(NSCLC)患者的自动放射适应方案,旨在以较低的 2 级放射性肺炎(RP2)发生率来最大化肿瘤局部控制率。
在 114 名接受放疗的 NSCLC 患者的回顾性人群中,开发了一个三组件神经网络框架,用于 DRL 的剂量分割适应。大规模的患者特征包括临床、遗传和影像学放射组学特征,以及肿瘤和肺剂量学变量。首先,使用生成对抗网络(GAN)从相对较小的样本量中学习 DRL 训练所需的患者人群特征。其次,通过深度神经网络(DNN)利用原始数据和合成数据(通过 GAN)重建放射治疗人工环境(RAE),以估计个性化放射治疗患者治疗过程适应的转移概率。第三,应用深度 Q 网络(DQN)在响应适应治疗环境中为最佳剂量选择。该多组件强化学习方法与实际临床决策进行了基准测试,这些决策应用于适应性剂量递增临床方案中。其中,34 名患者根据肿瘤中强烈的 PET 信号进行治疗,并受到 RP2 正常组织并发症概率(NTCP)限制为 17.2%的限制。无并发症治愈率(P+)作为 DRL 中的基线奖励函数。
以我们的适应性剂量递增方案作为拟议的 DRL(GAN + RAE + DQN)架构的蓝图,我们获得了在放射治疗过程进行到大约 2/3 时使用的自动剂量适应估计值。通过让 DQN 组件自由控制每部分的估计自适应剂量(范围为 1-5Gy),DRL 自动在 1.5 和 3.8Gy 之间进行剂量递增/递减,范围与临床方案相似。同一 DQN 为 34 名测试患者产生了两种剂量递增模式,但具有不同的奖励变体。首先,使用基线 P+奖励函数,DQN 的个体自适应分数剂量与临床数据具有相似的趋势,均方根误差(RMSE)为 0.76Gy;但是 DQN 建议的适应性剂量通常较小(不那么激进)。其次,通过调整 P+奖励函数,更加注重减轻局部失败的风险,DQN 与临床方案之间的剂量匹配度更好,均方根误差(RMSE)为 0.5Gy。此外,DQN 选择的决策似乎与患者最终结果具有更好的一致性。相比之下,强化学习的传统时间差分(TD)算法由于数值不稳定性和学习不足,导致均方根误差(RMSE)为 3.3Gy。
我们证明了 DRL 的自动剂量适应是一种可行且有前途的方法,可以达到临床医生选择的类似结果。如果要考虑个别病例,则可能需要定制奖励功能。然而,要将该框架开发成用于临床决策支持的完全可信的自主系统,还需要在更大的多机构数据集上进行进一步验证。