Department of Applied Mathematics, University of Waterloo, Waterloo, N2L 3G1, Canada.
Sci Rep. 2021 Sep 9;11(1):17882. doi: 10.1038/s41598-021-97028-6.
The in-silico development of a chemotherapeutic dosing schedule for treating cancer relies upon a parameterization of a particular tumour growth model to describe the dynamics of the cancer in response to the dose of the drug. In practice, it is often prohibitively difficult to ensure the validity of patient-specific parameterizations of these models for any particular patient. As a result, sensitivities to these particular parameters can result in therapeutic dosing schedules that are optimal in principle not performing well on particular patients. In this study, we demonstrate that chemotherapeutic dosing strategies learned via reinforcement learning methods are more robust to perturbations in patient-specific parameter values than those learned via classical optimal control methods. By training a reinforcement learning agent on mean-value parameters and allowing the agent periodic access to a more easily measurable metric, relative bone marrow density, for the purpose of optimizing dose schedule while reducing drug toxicity, we are able to develop drug dosing schedules that outperform schedules learned via classical optimal control methods, even when such methods are allowed to leverage the same bone marrow measurements.
治疗癌症的化疗剂量方案的计算机开发依赖于对特定肿瘤生长模型的参数化,以描述癌症对药物剂量的反应动力学。在实践中,通常很难确保对任何特定患者的这些模型的特定于患者的参数化的有效性。结果,对这些特定参数的敏感性可能导致在原则上最佳的治疗剂量方案在特定患者上表现不佳。在这项研究中,我们证明通过强化学习方法学习的化疗剂量策略比通过经典最优控制方法学习的策略对患者特定参数值的扰动更具有鲁棒性。通过在均值参数上训练强化学习代理,并允许代理定期访问更易于测量的指标(相对骨髓密度),以便在降低药物毒性的同时优化剂量方案,我们能够开发出比通过经典最优控制方法学习的药物剂量方案更好的药物剂量方案,即使这些方法被允许利用相同的骨髓测量值。