Morise A P, Diamond G A, Detrano R, Bobbio M, Gunel E
Department of Medicine, West Virginia University School of Medicine, Morgantown 26506, USA.
Med Decis Making. 1996 Apr-Jun;16(2):133-42. doi: 10.1177/0272989X9601600205.
The accuracy of a logistic prediction model is degraded when it is transported to populations with outcome prevalences different from that of the population used to derive the model. The resultant errors can have major clinical implications. Accordingly, the authors developed a logistic prediction model with respect to the noninvasive diagnosis of coronary disease based on 1,824 patients who underwent exercise testing and coronary angiography, varied the prevalence of disease in various "test" populations by random sampling of the original "derivation" population, and determined the accuracy of the logistic prediction model before and after the application of a mathematical algorithm designed to adjust only for these differences in prevalence. The accuracy of each prediction model was quantified in terms of receiver operating characteristic (ROC) curve area (discrimination) and chi-square goodness-of-fit (calibration). As the prevalence of the test population diverged from the prevalence of the derivation population, discrimination improved (ROC-curve areas increased from 0.82 +/- 0.02 to 0.87 +/- 0.03; p < 0.05), and calibration deteriorated (chi-square goodness-of-fit statistics increased from 9 to 154; p < 0.05). Following adjustment of the logistic intercept for differences in prevalence, discrimination was unchanged and calibration improved (maximum chi-square goodness-of-fit fell from 154 to 16). When the adjusted algorithm was applied to three geographically remote populations with prevalences that differed from that of the derivation population, calibration improved 87%, while discrimination fell by 1%. Thus, prevalence differences produce statistically significant and potentially clinically important errors in the accuracy of logistic prediction models. These errors can potentially be mitigated by use of a relatively simple mathematical correction algorithm.
当逻辑预测模型应用于结局患病率与用于推导该模型的人群不同的人群时,其准确性会降低。由此产生的误差可能具有重大的临床意义。因此,作者基于1824例接受运动试验和冠状动脉造影的患者,开发了一种用于冠心病无创诊断的逻辑预测模型,通过对原始“推导”人群进行随机抽样,改变各种“测试”人群中的疾病患病率,并在应用旨在仅针对这些患病率差异进行调整的数学算法前后,确定逻辑预测模型的准确性。每个预测模型的准确性通过受试者工作特征(ROC)曲线面积(区分度)和卡方拟合优度(校准度)进行量化。随着测试人群的患病率与推导人群的患病率出现差异,区分度提高(ROC曲线面积从0.82±0.02增加到0.87±0.03;p<0.05),而校准度恶化(卡方拟合优度统计量从9增加到154;p<0.05)。在对患病率差异调整逻辑截距后,区分度不变,校准度得到改善(最大卡方拟合优度从154降至16)。当将调整后的算法应用于三个患病率与推导人群不同的地理上偏远的人群时,校准度提高了87%,而区分度下降了1%。因此,患病率差异在逻辑预测模型的准确性方面产生了具有统计学意义且可能具有临床重要性的误差。通过使用相对简单的数学校正算法,这些误差可能会得到缓解。