1 Department of Statistics, Stanford University, Stanford, CA, USA.
2 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA.
Stat Methods Med Res. 2019 Jan;28(1):309-320. doi: 10.1177/0962280217723945. Epub 2017 Aug 16.
Personal predictive models for disease development play important roles in chronic disease prevention. The performance of these models is evaluated by applying them to the baseline covariates of participants in external cohort studies, with model predictions compared to subjects' subsequent disease incidence. However, the covariate distribution among participants in a validation cohort may differ from that of the population for which the model will be used. Since estimates of predictive model performance depend on the distribution of covariates among the subjects to which it is applied, such differences can cause misleading estimates of model performance in the target population. We propose a method for addressing this problem by weighting the cohort subjects to make their covariate distribution better match that of the target population. Simulations show that the method provides accurate estimates of model performance in the target population, while un-weighted estimates may not. We illustrate the method by applying it to evaluate an ovarian cancer prediction model targeted to US women, using cohort data from participants in the California Teachers Study. The methods can be implemented using open-source code for public use as the R-package RMAP (Risk Model Assessment Package) available at http://stanford.edu/~ggong/rmap/ .
个人疾病发展预测模型在慢性病预防中起着重要作用。这些模型的性能通过将其应用于外部队列研究参与者的基线协变量来评估,将模型预测与受试者随后的疾病发病率进行比较。然而,验证队列中参与者的协变量分布可能与模型将要使用的人群不同。由于预测模型性能的估计取决于其应用对象的协变量分布,因此这种差异可能会导致在目标人群中对模型性能的估计产生误导。我们提出了一种通过加权队列中的主体以使他们的协变量分布更好地匹配目标人群的方法来解决这个问题。模拟表明,该方法在目标人群中提供了对模型性能的准确估计,而未加权的估计可能不准确。我们通过将其应用于评估针对美国女性的卵巢癌预测模型来说明该方法,该模型使用来自加利福尼亚教师研究参与者的队列数据。该方法可以使用开源代码来实现,以公开使用,该代码作为可在 http://stanford.edu/~ggong/rmap/ 获得的 R 包 RMAP(风险模型评估包)。