Tao Yebin, Wang Lu, Almirall Daniel
Department of Biostatistics University of Michigan Ann Arbor, Michigan 48109 USA.
Institute for Social Research University of Michigan Ann Arbor, Michigan 48104 USA.
Ann Appl Stat. 2018 Sep;12(3):1914-1938. doi: 10.1214/18-AOAS1137. Epub 2018 Sep 11.
Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.
动态治疗方案(DTRs)是一系列治疗决策规则,其中治疗可根据个体病情的变化随时间进行调整。受物质使用障碍(SUD)研究的启发,我们提出了一种基于树的强化学习(T-RL)方法,以在多阶段多治疗环境中直接估计最优动态治疗方案。在每个阶段,T-RL构建一个无监督决策树,该决策树通过使用增强逆概率加权估计器构建的纯度度量,直接处理多重治疗比较的优化问题。对于多个阶段,该算法使用反向归纳法递归实现。如模拟研究所示,通过将半参数回归与灵活的基于树的学习相结合,T-RL在识别最优动态治疗方案方面具有稳健性、高效性且易于解释。使用所提出的方法,我们确定了青少年的动态物质使用障碍治疗方案。