Am J Epidemiol. 2023 Feb 1;192(2):296-304. doi: 10.1093/aje/kwac128.
We considered methods for transporting a prediction model for use in a new target population, both when outcome and covariate data for model development are available from a source population that has a different covariate distribution compared with the target population and when covariate data (but not outcome data) are available from the target population. We discuss how to tailor the prediction model to account for differences in the data distribution between the source population and the target population. We also discuss how to assess the model's performance (e.g., by estimating the mean squared prediction error) in the target population. We provide identifiability results for measures of model performance in the target population for a potentially misspecified prediction model under a sampling design where the source and the target population samples are obtained separately. We introduce the concept of prediction error modifiers that can be used to reason about tailoring measures of model performance to the target population. We illustrate the methods in simulated data and apply them to transport a prediction model for lung cancer diagnosis from the National Lung Screening Trial to the nationally representative target population of trial-eligible individuals in the National Health and Nutrition Examination Survey.
我们考虑了在新目标人群中使用预测模型的方法,包括当用于模型开发的结局和协变量数据可从具有与目标人群不同协变量分布的源人群获得,以及当仅可从目标人群获得协变量(但不是结局)数据时的情况。我们讨论了如何根据源人群和目标人群之间的数据分布差异来调整预测模型。我们还讨论了如何在目标人群中评估模型的性能(例如,通过估计均方预测误差)。我们提供了在目标人群中对模型性能进行度量的可识别性结果,这是在源人群和目标人群样本分别获得的抽样设计下对潜在指定不当的预测模型的结果。我们引入了预测误差修正因子的概念,可用于根据目标人群来调整模型性能度量。我们在模拟数据中演示了这些方法,并将其应用于从全国肺癌筛查试验向全国健康和营养检查调查中符合试验条件的目标人群中转移用于诊断肺癌的预测模型。