Suppr超能文献

利用基层医疗电子健康记录数据开发临床预测模型:数据准备选择对模型性能的影响。

Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance.

作者信息

van Os Hendrikus J A, Kanning Jos P, Wermer Marieke J H, Chavannes Niels H, Numans Mattijs E, Ruigrok Ynte M, van Zwet Erik W, Putter Hein, Steyerberg Ewout W, Groenwold Rolf H H

机构信息

Department of Neurology, Leiden University Medical Hospital, Leiden, Netherlands.

National eHealth Living Lab, Leiden University Medical Hospital, Leiden, Netherlands.

出版信息

Front Epidemiol. 2022 Jun 2;2:871630. doi: 10.3389/fepid.2022.871630. eCollection 2022.

Abstract

OBJECTIVE

To quantify prediction model performance in relation to data preparation choices when using electronic health records (EHR).

STUDY DESIGN AND SETTING

Cox proportional hazards models were developed for predicting the first-ever main adverse cardiovascular events using Dutch primary care EHR data. The reference model was based on a 1-year run-in period, cardiovascular events were defined based on both EHR diagnosis and medication codes, and missing values were multiply imputed. We compared data preparation choices based on (i) length of the run-in period (2- or 3-year run-in); (ii) outcome definition (EHR diagnosis codes or medication codes only); and (iii) methods addressing missing values (mean imputation or complete case analysis) by making variations on the derivation set and testing their impact in a validation set.

RESULTS

We included 89,491 patients in whom 6,736 first-ever main adverse cardiovascular events occurred during a median follow-up of 8 years. Outcome definition based only on diagnosis codes led to a systematic underestimation of risk (calibration curve intercept: 0.84; 95% CI: 0.83-0.84), while complete case analysis led to overestimation (calibration curve intercept: -0.52; 95% CI: -0.53 to -0.51). Differences in the length of the run-in period showed no relevant impact on calibration and discrimination.

CONCLUSION

Data preparation choices regarding outcome definition or methods to address missing values can have a substantial impact on the calibration of predictions, hampering reliable clinical decision support. This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting.

摘要

目的

在使用电子健康记录(EHR)时,量化与数据准备选择相关的预测模型性能。

研究设计与设置

使用荷兰初级保健EHR数据开发Cox比例风险模型,以预测首次发生的主要心血管不良事件。参考模型基于1年的导入期,心血管事件根据EHR诊断和用药代码定义,缺失值采用多重填补法。我们基于以下方面比较数据准备选择:(i)导入期长度(2年或3年导入期);(ii)结局定义(仅EHR诊断代码或仅用药代码);以及(iii)处理缺失值的方法(均值插补或完整病例分析),通过对推导集进行变化并在验证集中测试其影响。

结果

我们纳入了89491名患者,在中位随访8年期间,其中6736人首次发生主要心血管不良事件。仅基于诊断代码的结局定义导致风险系统性低估(校准曲线截距:0.84;95%CI:0.83 - 0.84),而完整病例分析导致高估(校准曲线截距: - 0.52;95%CI: - 0.53至 - 0.51)。导入期长度的差异对校准和区分度无相关影响。

结论

关于结局定义或处理缺失值方法的数据准备选择,可能对预测的校准产生重大影响,妨碍可靠的临床决策支持。本研究进一步说明了在EHR数据设置中透明报告建模选择的紧迫性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e5b/10910909/bd200fb43dfc/fepid-02-871630-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验