Ho Joyce C, Staimez Lisa R, Narayan K M Venkat, Ohno-Machado Lucila, Simpson Roy L, Hertzberg Vicki Stover
Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, United States.
Hubert Department of Global Health, Rollins School of Public Health, Emory University, United States.
Comput Methods Programs Biomed Update. 2023;3. doi: 10.1016/j.cmpbup.2022.100087. Epub 2022 Dec 19.
Various cardiovascular risk prediction models have been developed for patients with type 2 diabetes mellitus. Yet few models have been validated externally. We perform a comprehensive validation of existing risk models on a heterogeneous population of patients with type 2 diabetes using secondary analysis of electronic health record data.
Electronic health records of 47,988 patients with type 2 diabetes between 2013 and 2017 were used to validate 16 cardiovascular risk models, including 5 that had not been compared previously, to estimate the 1-year risk of various cardiovascular outcomes. Discrimination and calibration were assessed by the c-statistic and the Hosmer-Lemeshow goodness-of-fit statistic, respectively. Each model was also evaluated based on the missing measurement rate. Sub-analysis was performed to determine the impact of race on discrimination performance.
There was limited discrimination (c-statistics ranged from 0.51 to 0.67) across the cardiovascular risk models. Discrimination generally improved when the model was tailored towards the individual outcome. After recalibration of the models, the Hosmer-Lemeshow statistic yielded p-values above 0.05. However, several of the models with the best discrimination relied on measurements that were often imputed (up to 39% missing).
No single prediction model achieved the best performance on a full range of cardiovascular endpoints. Moreover, several of the highest-scoring models relied on variables with high missingness frequencies such as HbA1c and cholesterol that necessitated data imputation and may not be as useful in practice. An open-source version of our developed Python package, cvdm, is available for comparisons using other data sources.
已为2型糖尿病患者开发了多种心血管疾病风险预测模型。然而,很少有模型经过外部验证。我们使用电子健康记录数据的二次分析,对2型糖尿病患者的异质群体中现有的风险模型进行全面验证。
使用2013年至2017年间47988例2型糖尿病患者的电子健康记录,对16种心血管疾病风险模型进行验证,其中包括5种之前未进行过比较的模型,以估计各种心血管结局的1年风险。分别通过c统计量和Hosmer-Lemeshow拟合优度统计量评估区分度和校准度。还根据缺失测量率对每个模型进行评估。进行亚组分析以确定种族对区分性能的影响。
各种心血管疾病风险模型的区分度有限(c统计量范围为0.51至0.67)。当模型针对个体结局进行调整时,区分度通常会提高。对模型进行重新校准后,Hosmer-Lemeshow统计量得出的p值高于0.05。然而,一些区分度最佳的模型依赖于经常被估算的测量值(缺失率高达39%)。
没有单一的预测模型在所有心血管终点上都能达到最佳性能。此外,一些得分最高的模型依赖于缺失频率较高的变量,如糖化血红蛋白和胆固醇,这需要数据估算,在实际应用中可能不太有用。我们开发的Python包cvdm的开源版本可用于使用其他数据源进行比较。