Sun Zhaohong, Dong Wei, Li Haomin, Huang Zhengxing
Zhejiang University, Hangzhou, China.
Department of Cardiology, Chinese PLA General Hospital, Beijing, China.
J Biomed Inform. 2023 Jan;137:104244. doi: 10.1016/j.jbi.2022.104244. Epub 2022 Nov 17.
Treatment recommendation, as a critical task of delivering effective interventions according to patient state and expected outcome, plays a vital role in precision medicine and healthcare management. As a well-suited tactic to learn optimal policies of recommender systems, reinforcement learning is promising to address the challenge of treatment recommendation. However, existing solutions mostly require frequent interactions between treatment recommender systems and clinical environment, which are expensive, time-consuming, and even infeasible in clinical practice. In this study, we present a novel model-based offline reinforcement learning approach to optimize a treatment policy by utilizing patient treatment trajectories in Electronic Health Records (EHRs). Specifically, a patient treatment trajectory simulator is firstly constructed based on the ground-truth trajectories in EHRs. Thereafter, the constructed simulator is utilized to model the online interactions between the treatment recommender system and clinical environment. In this way, the counterfactual trajectories can be generated. To alleviate the bias deriving from the ground-truth and the counterfactual trajectories, an adversarial network is incorporated into the proposed model, such that a large space of treatment actions can be explored with the scaled rewards. The proposed model is evaluated on a simulated dataset and a real-world dataset. The experimental results demonstrate that the proposed model is superior to other methods, giving rise to a new solution for dynamic treatment regimes and beyond.
治疗推荐作为根据患者状态和预期结果提供有效干预措施的关键任务,在精准医疗和医疗管理中发挥着至关重要的作用。作为一种适用于学习推荐系统最优策略的策略,强化学习有望应对治疗推荐的挑战。然而,现有的解决方案大多需要治疗推荐系统与临床环境之间频繁交互,这在临床实践中成本高昂、耗时且甚至不可行。在本研究中,我们提出了一种基于模型的新型离线强化学习方法,通过利用电子健康记录(EHR)中的患者治疗轨迹来优化治疗策略。具体而言,首先基于EHR中的真实轨迹构建患者治疗轨迹模拟器。此后,利用构建的模拟器对治疗推荐系统与临床环境之间的在线交互进行建模。通过这种方式,可以生成反事实轨迹。为了减轻真实轨迹和反事实轨迹产生的偏差,在所提出的模型中引入了一个对抗网络,以便能够利用缩放后的奖励探索大量的治疗行动空间。在所提出的模型在一个模拟数据集和一个真实世界数据集上进行了评估。实验结果表明,所提出的模型优于其他方法,为动态治疗方案及其他领域带来了新的解决方案。