Laber Eric B, Staicu Ana-Maria
Department of Statistics, North Carolina State University, Raleigh, NC, 27695, U.S.A.
J Am Stat Assoc. 2017;113(523):1219-1227. doi: 10.1080/01621459.2017.1321545. Epub 2017 Jun 26.
Evidence-based personalized medicine formalizes treatment selection as an individualized treatment regime that maps up-to-date patient information into the space of possible treatments. Available patient information may include static features such race, gender, family history, genetic and genomic information, as well as longitudinal information including the emergence of comorbidities, waxing and waning of symptoms, side-effect burden, and adherence. Dynamic information measured at multiple time points before treatment assignment should be included as input to the treatment regime. However, subject longitudinal measurements are typically sparse, irregularly spaced, noisy, and vary in number across subjects. Existing estimators for treatment regimes require equal information be measured on each subject and thus standard practice is to summarize longitudinal subject information into a scalar, ad hoc summary during data pre-processing. This reduction of the longitudinal information to a scalar feature precedes estimation of a treatment regime and is therefore not informed by subject outcomes, treatments, or covariates. Furthermore, we show that this reduction requires more stringent causal assumptions for consistent estimation than are necessary. We propose a data-driven method for constructing maximally prescriptive yet interpretable features that can be used with standard methods for estimating optimal treatment regimes. In our proposed framework, we treat the subject longitudinal information as a realization of a stochastic process observed with error at discrete time points. Functionals of this latent process are then combined with outcome models to estimate an optimal treatment regime. The proposed methodology requires weaker causal assumptions than -learning with an ad hoc scalar summary and is consistent for the optimal treatment regime.
基于证据的个性化医疗将治疗选择形式化为一种个性化治疗方案,该方案将最新的患者信息映射到可能的治疗空间中。可用的患者信息可能包括种族、性别、家族病史、遗传和基因组信息等静态特征,以及包括合并症的出现、症状的起伏、副作用负担和依从性等纵向信息。在治疗分配前多个时间点测量的动态信息应作为治疗方案的输入。然而,个体纵向测量通常是稀疏的、间隔不规则的、有噪声的,并且在个体之间数量也不同。现有的治疗方案估计器要求对每个个体测量相同的信息,因此标准做法是在数据预处理期间将个体纵向信息汇总为一个标量的、临时的摘要。将纵向信息简化为一个标量特征是在治疗方案估计之前进行的,因此不受个体结果、治疗或协变量的影响。此外,我们表明这种简化对于一致估计需要比必要条件更严格的因果假设。我们提出了一种数据驱动的方法来构建最大程度规定性且可解释的特征,这些特征可用于估计最优治疗方案的标准方法。在我们提出的框架中,我们将个体纵向信息视为在离散时间点有误差观测到的随机过程的一个实现。然后将这个潜在过程的泛函与结果模型相结合来估计最优治疗方案。所提出的方法比使用临时标量摘要进行学习需要更弱的因果假设,并且对于最优治疗方案是一致的。