Murray Thomas A, Yuan Ying, Thall Peter F
Department of Biostatistics, MD Anderson Cancer Center.
J Am Stat Assoc. 2018;113(523):1255-1267. doi: 10.1080/01621459.2017.1340887. Epub 2018 Oct 8.
Medical therapy often consists of multiple stages, with a treatment chosen by the physician at each stage based on the patient's history of treatments and clinical outcomes. These decisions can be formalized as a dynamic treatment regime. This paper describes a new approach for optimizing dynamic treatment regimes that bridges the gap between Bayesian inference and existing approaches, like Q-learning. The proposed approach fits a series of Bayesian regression models, one for each stage, in reverse sequential order. Each model uses as a response variable the remaining payoff assuming optimal actions are taken at subsequent stages, and as covariates the current history and relevant actions at that stage. The key difficulty is that the optimal decision rules at subsequent stages are unknown, and even if these decision rules were known the relevant response variables may be counterfactual. However, posterior distributions can be derived from the previously fitted regression models for the optimal decision rules and the counterfactual response variables under a particular set of rules. The proposed approach averages over these posterior distributions when fitting each regression model. An efficient sampling algorithm for estimation is presented, along with simulation studies that compare the proposed approach with Q-learning.
医学治疗通常包括多个阶段,医生在每个阶段会根据患者的治疗史和临床结果选择一种治疗方法。这些决策可以形式化为动态治疗方案。本文描述了一种优化动态治疗方案的新方法,该方法弥合了贝叶斯推断与现有方法(如Q学习)之间的差距。所提出的方法以反向顺序拟合一系列贝叶斯回归模型,每个阶段一个。每个模型将假设在后续阶段采取最优行动时的剩余收益作为响应变量,并将当前历史和该阶段的相关行动作为协变量。关键困难在于后续阶段的最优决策规则是未知的,而且即使这些决策规则已知,相关的响应变量也可能是反事实的。然而,在特定的一组规则下,可以从先前拟合的回归模型中推导出最优决策规则和反事实响应变量的后验分布。所提出的方法在拟合每个回归模型时对这些后验分布进行平均。还提出了一种用于估计的有效抽样算法,以及将所提出的方法与Q学习进行比较的模拟研究。