Goldstein Benjamin A, Navar Ann Marie, Pencina Michael J, Ioannidis John P A
Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA
Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA.
J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.
Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. We sought to evaluate the current state of EHR based risk prediction modeling through a systematic review of clinical prediction studies using EHR data.
We searched PubMed for articles that reported on the use of an EHR to develop a risk prediction model from 2009 to 2014. Articles were extracted by two reviewers, and we abstracted information on study design, use of EHR data, model building, and performance from each publication and supplementary documentation.
We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage the breadth of EHR data, as they uncommonly used longitudinal information (n = 37) and employed relatively few predictor variables (median = 27 variables). Less than half of the studies were multicenter (n = 50) and only 26 performed validation across sites. Many studies did not fully address biases of EHR data such as missing data or loss to follow-up. Average c-statistics for different outcomes were: mortality (0.84), clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71).
EHR data present both opportunities and challenges for clinical risk prediction. There is room for improvement in designing such studies.
电子健康记录(EHRs)作为临床风险预测中越来越常见的数据源,既带来了独特的分析机遇,也带来了挑战。我们试图通过对使用EHR数据的临床预测研究进行系统综述,来评估基于EHR的风险预测模型的现状。
我们在PubMed中搜索2009年至2014年期间报道使用EHR来开发风险预测模型的文章。由两名评审员提取文章,并从每份出版物及补充文档中提取关于研究设计、EHR数据使用、模型构建和性能的信息。
我们从15个不同国家识别出107篇文章。研究通常规模很大(样本量中位数 = 26100),并使用了各种各样的预测因素。大多数研究使用了验证技术(107篇中有94篇),并报告了模型系数以确保可重复性(83篇)。然而,研究并未充分利用EHR数据的广度,因为它们很少使用纵向信息(37篇),且使用的预测变量相对较少(中位数 = 27个变量)。不到一半的研究是多中心研究(50篇),只有26篇进行了跨站点验证。许多研究没有充分解决EHR数据的偏差问题,如数据缺失或失访。不同结局的平均c统计量分别为:死亡率(0.84)、临床预测(0.83)、住院(0.71)和服务利用(0.71)。
EHR数据为临床风险预测既带来了机遇,也带来了挑战。在设计此类研究方面仍有改进空间。