Moodie Erica E M, Chakraborty Bibhas, Kramer Michael S
McGill University, Department of Epidemiology, Biostatistics, and Occupational Health, QC, Canada H3A 1A2.
Can J Stat. 2012 Dec 1;40(4):629-645. doi: 10.1002/cjs.11162. Epub 2012 Nov 7.
The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision-making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q-learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q-learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non-regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial.
动态治疗方案(DTR)领域旨在对临床实践中的适应性多阶段决策进行推断。一个DTR是一组决策规则,每个治疗间隔对应一个规则,其中每个决策都是治疗和协变量历史的函数,返回推荐的治疗方案。Q学习是强化学习文献中的一种常用方法,最近已被应用于估计DTR。虽然原则上Q学习可用于随机数据和观察数据,但迄今为止文献中的重点一直完全放在随机治疗设置上。我们扩展了该方法,使用直接调整和各种倾向得分方法纳入测量到的混杂协变量。这些方法在包括非正则情形在内的各种设置下进行了检验。我们根据母乳喂养促进干预试验的数据,举例说明了这些方法在检验母乳喂养对词汇测试影响方面的应用。