School of Mechanical, Industrial and Manufacturing Engineering, Oregon State University, Corvallis, OR 97331-6001, United States.
Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, 400 Daniels Hall, Raleigh, NC 27695-7906, United States.
J Biomed Inform. 2019 Sep;97:103255. doi: 10.1016/j.jbi.2019.103255. Epub 2019 Jul 23.
We aim to investigate the hypothesis that using information about which variables are missing along with appropriate imputation improves the performance of severity of illness scoring systems used to predict critical patient outcomes.
We quantify the impact of missing and imputed variables on the performance of prediction models used in the development of a sepsis-related severity of illness scoring system. Electronic health records (EHR) data were compiled from Christiana Care Health System (CCHS) on 119,968 adult patients hospitalized between July 2013 and December 2015. Two outcomes of interest were considered for prediction: (1) first transfer to intensive care unit (ICU) and (2) in-hospital mortality. Five different prediction models were employed. Indicators were utilized in these prediction models to identify when variables were missing and imputed.
We observed statistically significant gains in prediction performance when moving from models that did not indicate missing information to those that did. Moreover, this increase was higher in models that use summary variables as predictors compared to those that use all variables.
When developing prediction models using longitudinal EHR data, researchers should explore the incorporation of indicators for missing variables along with appropriate imputation.
我们旨在验证以下假设,即利用有关缺失变量的信息和适当的插补方法可以提高用于预测重症患者结局的疾病严重程度评分系统的性能。
我们量化了缺失和插补变量对脓毒症相关疾病严重程度评分系统开发中使用的预测模型性能的影响。电子健康记录(EHR)数据来自于克里斯蒂安娜医疗保健系统(CCHS),涵盖了 2013 年 7 月至 2015 年 12 月期间住院的 119968 名成年患者。我们考虑了两种感兴趣的预测结果:(1)首次转入重症监护病房(ICU)和(2)住院死亡率。使用了五种不同的预测模型。这些预测模型中使用了指标来识别变量是否缺失和插补。
我们观察到,从不指示缺失信息的模型到指示缺失信息的模型,预测性能有显著提高。此外,与使用所有变量的模型相比,使用汇总变量作为预测因子的模型的提高幅度更高。
在使用纵向 EHR 数据开发预测模型时,研究人员应探索纳入缺失变量的指标以及适当的插补方法。