Subramanian Vigneshwar, Mascha Edward J, Kattan Michael W
From the Cleveland Clinic Lerner College of Medicine at Case Western Reserve University, Cleveland, Ohio.
Departments of Quantitative Health Sciences and Outcomes Research and.
Anesth Analg. 2021 Jun 1;132(6):1603-1613. doi: 10.1213/ANE.0000000000005362.
Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.
研究人员常常以简化为由,将基于统计回归模型构建的预测工具转换为整数分数和风险分类系统。然而,这种流程会丢弃有用信息并降低预测准确性。因此,我们通过一项模拟研究和一个临床数据集示例,调查了研究人员将回归模型简化为整数分数时对预测准确性的影响。随机生成模拟的独立训练集和测试集(n = 1000),以使逻辑回归模型在接收器操作特征曲线(AUC)下的指定目标区域表现为0.7、0.8或0.9。对每个数据集用连续协变量拟合逻辑回归后,使用数据依赖的切点将连续变量二分法化。重新拟合逻辑回归,并对系数进行缩放和四舍五入以创建整数分数。通过将整数分数分层为低、中、高风险三分位数来构建风险分类系统。通过计算每个模型的AUC和预测准确性指数(IPA)来评估区分度和校准度。计算训练集和测试集之间AUC和IPA性能的乐观度。使用协变量连续形式的逻辑回归模型优于所有其他模型。在模拟研究中,将逻辑回归模型转换为整数分数及随后的风险分类系统导致AUC平均下降0.057 - 0.094,IPA绝对下降6.2% - 在IPA中,绝对下降6.2% - 17.5%。AUC和IPA的最大下降发生在二分法化步骤。二分法化和风险分层步骤也增加了所得模型的乐观度,使得它们在新数据上看起来比实际预测能力更好。在临床数据集中,将具有连续协变量的逻辑回归转换为整数分数导致外部验证的AUC下降0.06,外部验证的IPA下降13%。将回归模型转换为整数分数会显著降低模型性能。因此,我们建议开发一个纳入所有可用信息的回归模型以进行尽可能准确的预测,并在对个体患者进行预测时使用未改变的回归模型。在所有情况下,研究人员应注意正确验证打算用于临床的特定模型。