利用基层医疗电子健康记录数据开发临床预测模型：数据准备选择对模型性能的影响。

Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance.

作者信息

van Os Hendrikus J A, Kanning Jos P, Wermer Marieke J H, Chavannes Niels H, Numans Mattijs E, Ruigrok Ynte M, van Zwet Erik W, Putter Hein, Steyerberg Ewout W, Groenwold Rolf H H

机构信息

Department of Neurology, Leiden University Medical Hospital, Leiden, Netherlands.

National eHealth Living Lab, Leiden University Medical Hospital, Leiden, Netherlands.

出版信息

Front Epidemiol. 2022 Jun 2;2:871630. doi: 10.3389/fepid.2022.871630. eCollection 2022.

DOI:10.3389/fepid.2022.871630

PMID:38455328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10910909/

Abstract

OBJECTIVE

To quantify prediction model performance in relation to data preparation choices when using electronic health records (EHR).

STUDY DESIGN AND SETTING

Cox proportional hazards models were developed for predicting the first-ever main adverse cardiovascular events using Dutch primary care EHR data. The reference model was based on a 1-year run-in period, cardiovascular events were defined based on both EHR diagnosis and medication codes, and missing values were multiply imputed. We compared data preparation choices based on (i) length of the run-in period (2- or 3-year run-in); (ii) outcome definition (EHR diagnosis codes or medication codes only); and (iii) methods addressing missing values (mean imputation or complete case analysis) by making variations on the derivation set and testing their impact in a validation set.

RESULTS

We included 89,491 patients in whom 6,736 first-ever main adverse cardiovascular events occurred during a median follow-up of 8 years. Outcome definition based only on diagnosis codes led to a systematic underestimation of risk (calibration curve intercept: 0.84; 95% CI: 0.83-0.84), while complete case analysis led to overestimation (calibration curve intercept: -0.52; 95% CI: -0.53 to -0.51). Differences in the length of the run-in period showed no relevant impact on calibration and discrimination.

CONCLUSION

Data preparation choices regarding outcome definition or methods to address missing values can have a substantial impact on the calibration of predictions, hampering reliable clinical decision support. This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting.

摘要

目的

在使用电子健康记录（EHR）时，量化与数据准备选择相关的预测模型性能。

研究设计与设置

使用荷兰初级保健EHR数据开发Cox比例风险模型，以预测首次发生的主要心血管不良事件。参考模型基于1年的导入期，心血管事件根据EHR诊断和用药代码定义，缺失值采用多重填补法。我们基于以下方面比较数据准备选择：（i）导入期长度（2年或3年导入期）；（ii）结局定义（仅EHR诊断代码或仅用药代码）；以及（iii）处理缺失值的方法（均值插补或完整病例分析），通过对推导集进行变化并在验证集中测试其影响。

结果

我们纳入了89491名患者，在中位随访8年期间，其中6736人首次发生主要心血管不良事件。仅基于诊断代码的结局定义导致风险系统性低估（校准曲线截距：0.84；95%CI：0.83 - 0.84），而完整病例分析导致高估（校准曲线截距： - 0.52；95%CI： - 0.53至 - 0.51）。导入期长度的差异对校准和区分度无相关影响。

结论

关于结局定义或处理缺失值方法的数据准备选择，可能对预测的校准产生重大影响，妨碍可靠的临床决策支持。本研究进一步说明了在EHR数据设置中透明报告建模选择的紧迫性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e5b/10910909/bd200fb43dfc/fepid-02-871630-g0001.jpg

相似文献

Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance.利用基层医疗电子健康记录数据开发临床预测模型：数据准备选择对模型性能的影响。

Front Epidemiol. 2022 Jun 2;2:871630. doi: 10.3389/fepid.2022.871630. eCollection 2022.

Machine-learning Models Predict 30-Day Mortality, Cardiovascular Complications, and Respiratory Complications After Aseptic Revision Total Joint Arthroplasty.机器学习模型预测无菌翻修全关节置换术后 30 天死亡率、心血管并发症和呼吸系统并发症。

Clin Orthop Relat Res. 2022 Nov 1;480(11):2137-2145. doi: 10.1097/CORR.0000000000002276. Epub 2022 Jun 20.

External Validation of a Prediction Model for Falls in Older People Based on Electronic Health Records in Primary Care.基于基层医疗电子健康记录的老年人跌倒预测模型的外部验证

J Am Med Dir Assoc. 2022 Oct;23(10):1691-1697.e3. doi: 10.1016/j.jamda.2022.07.002. Epub 2022 Aug 10.

Predicting need for advanced illness or palliative care in a primary care population using electronic health record data.利用电子健康记录数据预测初级保健人群中对晚期疾病或姑息治疗的需求。

J Biomed Inform. 2019 Apr;92:103115. doi: 10.1016/j.jbi.2019.103115. Epub 2019 Feb 10.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction.使用双向生成对抗网络对电子健康记录数据进行并发插补和预测：用于电子健康记录插补和预测的双向生成对抗网络

ACM BCB. 2021 Aug;2021. doi: 10.1145/3459930.3469512.

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?骨肿瘤研究组算法对软骨肉瘤患者 5 年生存率的预测在国际验证中的表现如何？

Clin Orthop Relat Res. 2020 Oct;478(10):2300-2308. doi: 10.1097/CORR.0000000000001305.

Prognostic models for identifying risk of poor outcome in people with acute ankle sprains: the SPRAINED development and external validation study.用于识别急性踝关节扭伤患者不良结局风险的预测模型：SPRAINED 研究的开发和外部验证。

Health Technol Assess. 2018 Nov;22(64):1-112. doi: 10.3310/hta22640.

Adult patient access to electronic health records.成年患者获取电子健康记录。

Cochrane Database Syst Rev. 2021 Feb 26;2(2):CD012707. doi: 10.1002/14651858.CD012707.pub2.

引用本文的文献

Data Resource Profile: Extramural Leiden University Medical Center Academic Network (ELAN).数据资源简介：莱顿大学医学中心校外学术网络（ELAN）

Int J Epidemiol. 2024 Jun 12;53(4). doi: 10.1093/ije/dyae099.

Prediction of aneurysmal subarachnoid hemorrhage in comparison with other stroke types using routine care data.利用常规护理数据预测与其他类型中风相比的动脉瘤性蛛网膜下腔出血。

PLoS One. 2024 May 31;19(5):e0303868. doi: 10.1371/journal.pone.0303868. eCollection 2024.

本文引用的文献

Informative missingness in electronic health record systems: the curse of knowing.电子健康记录系统中的信息性缺失：知晓之祸。

Diagn Progn Res. 2020 Jul 2;4:8. doi: 10.1186/s41512-020-00077-0. eCollection 2020.

Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data.贝叶斯分析方法能否纠正电子健康记录诊断数据中的不完整性？使用模拟和真实临床数据开发一种新方法。

Front Public Health. 2020 Mar 5;8:54. doi: 10.3389/fpubh.2020.00054. eCollection 2020.

Calibration: the Achilles heel of predictive analytics.校准：预测分析的阿喀琉斯之踵。

BMC Med. 2019 Dec 16;17(1):230. doi: 10.1186/s12916-019-1466-7.

The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation.常规医疗保健数据库中的错误分类对预后预测模型准确性的影响：以心房颤动的CHA2DS2-VASc评分为例的案例研究

Diagn Progn Res. 2017 Nov 16;1:18. doi: 10.1186/s41512-017-0018-x. eCollection 2017.

Graphical Depiction of Longitudinal Study Designs in Health Care Databases.图形化展示医疗数据库中的纵向研究设计。

Ann Intern Med. 2019 Mar 19;170(6):398-406. doi: 10.7326/M18-3079. Epub 2019 Mar 12.

Sharing data from electronic health records within, across, and beyond healthcare institutions: Current trends and perspectives.在医疗机构内部、之间以及之外共享电子健康记录数据：当前趋势与展望。

J Am Med Inform Assoc. 2018 Sep 1;25(9):1113. doi: 10.1093/jamia/ocy116.

Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data.利用观察性医疗保健数据生成和评估患者水平预测模型的标准化框架的设计与实现。

J Am Med Inform Assoc. 2018 Aug 1;25(8):969-975. doi: 10.1093/jamia/ocy032.

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.电子健康记录中结构化缺失数据的特征描述与管理：数据分析

JMIR Med Inform. 2018 Feb 23;6(1):e11. doi: 10.2196/medinform.8960.

A comparison of risk prediction methods using repeated observations: an application to electronic health records for hemodialysis.使用重复观测值的风险预测方法比较：在血液透析电子健康记录中的应用

Stat Med. 2017 Jul 30;36(17):2750-2763. doi: 10.1002/sim.7308. Epub 2017 May 2.

Predictive performance of the CHA2DS2-VASc rule in atrial fibrillation: a systematic review and meta-analysis.CHA2DS2-VASc 评分在心房颤动中的预测性能：系统评价和荟萃分析。

J Thromb Haemost. 2017 Jun;15(6):1065-1077. doi: 10.1111/jth.13690. Epub 2017 May 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基层医疗电子健康记录数据开发临床预测模型：数据准备选择对模型性能的影响。

Developing Clinical Prediction Models Using Primary Care Electronic Health Record Data: The Impact of Data Preparation Choices on Model Performance.

作者信息

机构信息

出版信息

OBJECTIVE

STUDY DESIGN AND SETTING

RESULTS

CONCLUSION

目的

研究设计与设置

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献