UMC Utrecht Julius Center, Utrecht University, Utrecht, The Netherlands.
Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, The Netherlands.
J Clin Epidemiol. 2019 Jan;105:136-141. doi: 10.1016/j.jclinepi.2018.09.001. Epub 2018 Sep 14.
Diagnostic and prognostic prediction models often perform poorly when externally validated. We investigate how differences in the measurement of predictors across settings affect the discriminative power and transportability of a prediction model.
Differences in predictor measurement between data sets can be described formally using a measurement error taxonomy. Using this taxonomy, we derive an expression relating variation in the measurement of a continuous predictor to the area under the receiver operating characteristic curve (AUC) of a logistic regression prediction model. This expression is used to demonstrate how variation in measurements across settings affects the out-of-sample discriminative ability of a prediction model. We illustrate these findings with a diagnostic prediction model using example data of patients suspected of having deep venous thrombosis.
When a predictor, such as D-dimer, is measured with more noise in one setting compared to another, which we conceptualize as a difference in "classical" measurement error, the expected value of the AUC decreases. In contrast, constant, "structural" measurement error does not impact on the AUC of a logistic regression model, provided the magnitude of the error is the same among cases and noncases. As the differences in measurement methods between settings (and in turn differences in measurement error structures) become more complex, it becomes increasingly difficult to predict how the AUC will differ between settings.
When a prediction model is applied to a different setting to the one in which it was developed, its discriminative ability can decrease or even increase if the magnitude or structure of the errors in predictor measurements differ between the two settings. This provides an important starting point for researchers to better understand how differences in measurement methods can affect the performance of a prediction model when externally validating or implementing it in practice.
诊断和预后预测模型在外部验证时往往表现不佳。我们研究了预测模型的判别能力和可转移性如何受到不同环境下预测因素测量的差异的影响。
可以使用测量误差分类法对数据集之间的预测因素测量差异进行形式描述。利用该分类法,我们推导出一个与逻辑回归预测模型的接收者操作特征曲线(AUC)下面积相关的表达式,该表达式涉及连续预测因子测量值的变化与预测模型的样本外判别能力的关系。我们使用疑似深静脉血栓形成的患者的示例数据,通过诊断预测模型来说明这些发现。
当预测因子(如 D-二聚体)在一个环境中比在另一个环境中测量时存在更多的噪声时,我们将其概念化为“经典”测量误差的差异,AUC 的期望值会降低。相比之下,如果病例和非病例之间的误差幅度相同,则恒定的“结构性”测量误差不会影响逻辑回归模型的 AUC。当设置之间的测量方法差异(进而测量误差结构差异)变得更加复杂时,预测 AUC 在设置之间的差异变得越来越困难。
当预测模型应用于与开发模型不同的环境时,如果两个环境中预测因素测量的误差幅度或结构不同,其判别能力可能会降低,甚至可能会增加。这为研究人员提供了一个重要的起点,以更好地理解在外部验证或实际实施预测模型时,测量方法的差异如何影响预测模型的性能。