Zhang Baqun, Tsiatis Anastasios A, Laber Eric B, Davidian Marie
Department of Preventive Medicine, 680 N. Lakeshore Drive, Suite 1400 Northwestern University, Chicago, Illinois, 60611 U.S.A.
Biometrika. 2013;100(3). doi: 10.1093/biomet/ast014.
A dynamic treatment regime is a list of sequential decision rules for assigning treatment based on a patient's history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignment. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method's performance and robustness to model misspecification, which is a key concern.
动态治疗方案是一系列基于患者病史分配治疗的顺序决策规则。Q学习和A学习是基于临床试验或观察性研究的数据估计最优方案(即在患者群体中产生最有益结果的方案)的两种主要方法。Q学习需要假定结果的回归模型,而A学习涉及结果回归中代表治疗对比的部分以及治疗分配的模型。我们提出了一种替代Q学习和A学习的方法,该方法在受限的方案类别中最大化总体平均结果的双重稳健增强逆概率加权估计器。模拟证明了该方法对模型误设的性能和稳健性,这是一个关键问题。