Suppr超能文献

糖尿病前期预测模型的开发、验证与重新校准:一项基于电子健康记录和美国国家健康与营养检查调查的研究

Development, validation and recalibration of a prediction model for prediabetes: an EHR and NHANES-based study.

作者信息

Casacchia Nicholas J, Lenoir Kristin M, Rigdon Joseph, Wells Brian J

机构信息

Center for Value-Based Care Research, Primary Care Institute, Cleveland Clinic, 9500 Euclid Ave, G10, Cleveland, OH, 44195, USA.

Division of Public Health Sciences, Department of Biostatistics and Data Science, Wake Forest University School of Medicine, 525 Vine St, Winston-Salem, NC, 27101, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Dec 18;24(1):387. doi: 10.1186/s12911-024-02803-w.

Abstract

BACKGROUND

A prediction model that estimates the risk of elevated glycated hemoglobin (HbA1c) was developed from electronic health record (EHR) data to identify adult patients at risk for prediabetes who may otherwise go undetected. We aimed to assess the internal performance of a new penalized regression model using the same EHR data and compare it to the previously developed stepdown approximation for predicting HbA1c ≥ 5.7%, the cut-off for prediabetes. Additionally, we sought to externally validate and recalibrate the approximation model using 2017-2020 pre-pandemic National Health and Nutrition Examination Survey (NHANES) data.

METHODS

We developed logistic regression models using EHR data through two approaches: the Least Absolute Shrinkage and Selection Operator (LASSO) and stepdown approximation. Internal validation was performed using the bootstrap method, with internal performance evaluated by the Brier score, C-statistic, calibration intercept and slope, and the integrated calibration index. We externally validated the approximation model by applying original model coefficients to NHANES, and we examined the approximation model's performance after recalibration in NHANES.

RESULTS

The EHR cohort included 22,635 patients, with 26% identified as having prediabetes. Both the LASSO and approximation models demonstrated similar discrimination in the EHR cohort, with optimism-corrected C-statistics of 0.760 and 0.763, respectively. The LASSO model included 23 predictor variables, while the approximation model contained 8. Among the 2,348 NHANES participants who met the inclusion criteria, 30.1% had prediabetes. External validation of the LASSO model was not possible due to the unavailability of some predictor variables. The approximation model discriminated well in the NHANES dataset, achieving a C-statistic of 0.787.

CONCLUSION

The approximation method demonstrated comparable performance to LASSO in the EHR development cohort, making it a viable option for healthcare organizations with limited resources to collect a comprehensive set of candidate predictor variables. NHANES data may be suitable for externally validating a clinical prediction model developed with EHR data to assess generalizability to a nationally representative sample, depending on the model's intended use and the alignment of predictor variable definitions with those used in the model's original development.

摘要

背景

利用电子健康记录(EHR)数据开发了一种预测模型,用于估计糖化血红蛋白(HbA1c)升高的风险,以识别有糖尿病前期风险但可能未被发现的成年患者。我们旨在使用相同的EHR数据评估一种新的惩罚回归模型的内部性能,并将其与先前开发的用于预测HbA1c≥5.7%(糖尿病前期的临界值)的逐步近似法进行比较。此外,我们试图使用2017 - 2020年大流行前的国家健康和营养检查调查(NHANES)数据对近似模型进行外部验证和重新校准。

方法

我们通过两种方法使用EHR数据开发逻辑回归模型:最小绝对收缩和选择算子(LASSO)法和逐步近似法。使用自助法进行内部验证,通过Brier评分、C统计量、校准截距和斜率以及综合校准指数评估内部性能。我们通过将原始模型系数应用于NHANES数据对近似模型进行外部验证,并检查在NHANES数据中重新校准后近似模型的性能。

结果

EHR队列包括22,635名患者,其中26%被确定为患有糖尿病前期。LASSO模型和近似模型在EHR队列中表现出相似的区分能力,乐观校正后的C统计量分别为0.760和0.763。LASSO模型包含23个预测变量,而近似模型包含8个。在符合纳入标准的2348名NHANES参与者中,30.1%患有糖尿病前期。由于某些预测变量不可用,无法对LASSO模型进行外部验证。近似模型在NHANES数据集中区分能力良好,C统计量达到0.787。

结论

在EHR开发队列中,近似法与LASSO法表现出相当的性能,对于资源有限无法收集全面的候选预测变量集的医疗机构而言,它是一个可行的选择。根据模型的预期用途以及预测变量定义与模型原始开发中使用的定义的一致性,NHANES数据可能适用于对用EHR数据开发的临床预测模型进行外部验证,以评估其对全国代表性样本的普遍性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbc4/11657225/766fa0bf741d/12911_2024_2803_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验