Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Institute for Social Research, University of Michigan, Ann Arbor, Michigan, USA.
Biometrics. 2023 Sep;79(3):2260-2271. doi: 10.1111/biom.13754. Epub 2022 Oct 9.
A dynamic treatment regime (DTR) is a sequence of decision rules that provide guidance on how to treat individuals based on their static and time-varying status. Existing observational data are often used to generate hypotheses about effective DTRs. A common challenge with observational data, however, is the need for analysts to consider "restrictions" on the treatment sequences. Such restrictions may be necessary for settings where (1) one or more treatment sequences that were offered to individuals when the data were collected are no longer considered viable in practice, (2) specific treatment sequences are no longer available, or (3) the scientific focus of the analysis concerns a specific type of treatment sequences (eg, "stepped-up" treatments). To address this challenge, we propose a restricted tree-based reinforcement learning (RT-RL) method that searches for an interpretable DTR with the maximum expected outcome, given a (set of) user-specified restriction(s), which specifies treatment options (at each stage) that ought not to be considered as part of the estimated tree-based DTR. In simulations, we evaluate the performance of RT-RL versus the standard approach of ignoring the partial data for individuals not following the (set of) restriction(s). The method is illustrated using an observational data set to estimate a two-stage stepped-up DTR for guiding the level of care placement for adolescents with substance use disorder.
动态治疗方案(DTR)是一系列决策规则,用于根据个体的静态和时变状态提供治疗指导。现有的观察性数据通常用于生成关于有效 DTR 的假设。然而,观察性数据通常存在一个挑战,即分析师需要考虑治疗方案的“限制”。这些限制可能是必要的,例如在以下情况下:(1)在数据收集时提供给个体的一个或多个治疗方案在实践中不再可行;(2)特定的治疗方案不再可用;(3)分析的科学重点关注特定类型的治疗方案(例如“逐步升级”的治疗)。为了解决这个挑战,我们提出了一种受限的基于树的强化学习(RT-RL)方法,该方法在给定(一组)用户指定的限制的情况下,搜索具有最大预期结果的可解释 DTR,该限制指定了在每个阶段不应该考虑的治疗方案(作为估计的基于树的 DTR 的一部分)。在模拟中,我们评估了 RT-RL 与忽略不符合(一组)限制的个体的部分数据的标准方法的性能。该方法使用一个观察性数据集进行了说明,以估计用于指导有物质使用障碍的青少年护理水平的两阶段逐步升级 DTR。