Ma Junchao, Lee Donald K K, Perkins Michael E, Pisani Margaret A, Pinker Edieal
School of Management, Yale University, New Haven, CT.
Goizueta Business School, Emory University, Atlanta, GA.
Crit Care Explor. 2019 Apr 17;1(4):e0010. doi: 10.1097/CCE.0000000000000010. eCollection 2019 Apr.
Observational, retrospective study of patient medical records for training and testing of statistical learning models using different sets of predictor variables.
Medical ICU at the Yale-New Haven Hospital.
Electronic health records of 3,763 patients admitted to the medical ICU between January 2013 and January 2015.
None.
Six-hour mortality predictions for ICU patients were generated and updated every 6 hours by applying the random forest classifier to patient time series data from the prior 24 hours. The time series were processed in different ways to create two main models: 1) manual extraction of the summary statistics used in the literature (min/max/median/first/last/number of measurements) and 2) automated extraction of trajectory features using machine learning. Out-of-sample area under the receiver operating characteristics curve and area under the precision-recall curve ("precision" refers to positive predictive value and "recall" to sensitivity) were used to evaluate the predictive performance of the two models. For 6-hour prediction and updating, the second model achieved area under the receiver operating characteristics curve and area under the precision-recall curve of 0.905 (95% CI, 0.900-0.910) and 0.381 (95% CI, 0.368-0.394), respectively, which are statistically significantly higher than those achieved by the first model, with area under the receiver operating characteristics curve and area under the precision-recall curve of 0.896 (95% CI, 0.892-0.900) and 0.905 (95% CI, 0.353-0.379). The superiority of the second model held true for 12-hour prediction/updating as well as for 24-hour prediction/updating.
We show that statistical learning techniques can be used to automatically extract all relevant shape features for use in predictive modeling. The approach requires no additional data and can potentially be used to improve any risk model that uses some form of trajectory information. In this single-center study, the shapes of the clinical data trajectories convey information about ICU mortality risk beyond what is already captured by the summary statistics currently used in the literature.
对患者病历进行观察性、回顾性研究,以使用不同组预测变量训练和测试统计学习模型。
耶鲁 - 纽黑文医院的内科重症监护病房。
2013年1月至2015年1月期间入住内科重症监护病房的3763例患者的电子健康记录。
无。
通过将随机森林分类器应用于前24小时的患者时间序列数据,每6小时生成并更新ICU患者的6小时死亡率预测。对时间序列进行不同方式处理以创建两个主要模型:1) 手动提取文献中使用的汇总统计量(最小值/最大值/中位数/第一个/最后一个/测量次数);2) 使用机器学习自动提取轨迹特征。使用样本外的受试者工作特征曲线下面积和精确召回率曲线下面积(“精确率”指阳性预测值,“召回率”指灵敏度)来评估两个模型的预测性能。对于6小时预测和更新,第二个模型的受试者工作特征曲线下面积和精确召回率曲线下面积分别为0.905(95%置信区间,0.900 - 0.910)和0.381(95%置信区间,0.368 - 0.394),在统计学上显著高于第一个模型,第一个模型的受试者工作特征曲线下面积和精确召回率曲线下面积分别为0.896(95%置信区间,0.892 - 0.900)和0.905(95%置信区间,0.353 - 0.379)。第二个模型在12小时预测/更新以及24小时预测/更新中同样具有优势。
我们表明统计学习技术可用于自动提取所有相关形状特征以用于预测建模。该方法无需额外数据,并且有可能用于改进任何使用某种形式轨迹信息的风险模型。在这项单中心研究中,临床数据轨迹的形状所传达的关于ICU死亡率风险的信息,超出了文献中目前使用的汇总统计量所捕获的信息。