Hargrave Mason, Spaeth Alex, Grosenick Logan
Center for Studies in Physics and Biology, The Rockefeller University, New York, NY, USA.
Dept. of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
Adv Neural Inf Process Syst. 2024;37:130536-130568.
Healthcare applications pose significant challenges to existing reinforcement learning (RL) methods due to implementation risks, limited data availability, short treatment episodes, sparse rewards, partial observations, and heterogeneous treatment effects. Despite significant interest in using RL to generate dynamic treatment regimes for longitudinal patient care scenarios, no standardized benchmark has yet been developed. To fill this need we introduce (), a benchmark designed to mimic the challenges associated with applying RL to longitudinal healthcare settings. We leverage this benchmark to test five state-of-the-art offline RL models as well as five common off-policy evaluation (OPE) techniques. Our results suggest that while offline RL may be capable of improving upon existing standards of care given sufficient data, its applicability does not appear to extend to the moderate to low data regimes typical of current healthcare settings. Additionally, we demonstrate that several OPE techniques standard in the the medical RL literature fail to perform adequately on our benchmark. These results suggest that the performance of RL models in dynamic treatment regimes may be difficult to meaningfully evaluate using current OPE methods, indicating that RL for this application domain may still be in its early stages. We hope that these results along with the benchmark will facilitate better comparison of existing methods and inspire further research into techniques that increase the practical applicability of medical RL.
由于实施风险、数据可用性有限、治疗周期短、奖励稀疏、部分观察结果以及异质治疗效果等因素,医疗保健应用给现有的强化学习(RL)方法带来了重大挑战。尽管人们对使用强化学习为纵向患者护理场景生成动态治疗方案有着浓厚兴趣,但尚未开发出标准化的基准测试。为满足这一需求,我们引入了(),这是一个旨在模拟将强化学习应用于纵向医疗保健环境所面临挑战的基准测试。我们利用这个基准测试来测试五个最先进的离线强化学习模型以及五种常见的离策略评估(OPE)技术。我们的结果表明,虽然在有足够数据的情况下,离线强化学习可能能够改进现有的护理标准,但其适用性似乎并未扩展到当前医疗保健环境中典型的中低数据量情况。此外,我们证明了医学强化学习文献中几种标准的OPE技术在我们的基准测试中表现不佳。这些结果表明,使用当前的OPE方法可能难以对动态治疗方案中的强化学习模型性能进行有意义的评估,这表明该应用领域的强化学习可能仍处于早期阶段。我们希望这些结果以及该基准测试将有助于更好地比较现有方法,并激发对提高医学强化学习实际适用性的技术的进一步研究。