Lee Joon
Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada.
JMIR Med Inform. 2017 Jan 17;5(1):e3. doi: 10.2196/medinform.6690.
With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling.
The objective of the study is to investigate the random forest (RF) proximity measure as a PSM in the context of personalized mortality prediction for intensive care unit (ICU) patients.
A total of 17,152 ICU admissions were extracted from the Multiparameter Intelligent Monitoring in Intensive Care II database. A number of predictor variables were extracted from the first 24 hours in the ICU. Outcome to be predicted was 30-day mortality. A patient-specific predictive model was trained for each ICU admission using an RF PSM inspired by the RF proximity measure. Death counting, logistic regression, decision tree, and RF models were studied with a hard threshold applied to RF PSM values to only include the M most similar patients in model training, where M was varied. In addition, case-specific random forests (CSRFs), which uses RF proximity for weighted bootstrapping, were trained.
Compared to our previous study that investigated a cosine similarity PSM, the RF PSM resulted in superior or comparable predictive performance. RF and CSRF exhibited the best performances (in terms of mean area under the receiver operating characteristic curve [95% confidence interval], RF: 0.839 [0.835-0.844]; CSRF: 0.832 [0.821-0.843]). RF and CSRF did not benefit from personalization via the use of the RF PSM, while the other models did.
The RF PSM led to good mortality prediction performance for several predictive models, although it failed to induce improved performance in RF and CSRF. The distinction between predictor and similarity variables is an important issue arising from the present study. RFs present a promising method for patient-specific outcome prediction.
借助大规模电子健康记录库,构建专门针对特定患者的定制化患者预后预测模型是可行的。这种方法包括识别与当前患者相似的既往患者,并利用他们的数据来训练个性化预测模型。我们之前的工作研究了用于此类患者特异性预测建模的余弦相似度患者相似性度量(PSM)。
本研究的目的是在重症监护病房(ICU)患者的个性化死亡率预测背景下,研究随机森林(RF)接近度度量作为一种PSM。
从重症监护智能监测二期数据库中提取了总共17152例ICU入院病例。从患者入住ICU的前24小时提取了一些预测变量。待预测的结局是30天死亡率。使用受RF接近度度量启发的RF PSM为每例ICU入院病例训练一个患者特异性预测模型。研究了死亡计数、逻辑回归、决策树和RF模型,并对RF PSM值应用硬阈值,以便在模型训练中仅纳入M个最相似的患者,其中M是可变的。此外,还训练了使用RF接近度进行加权自助法的病例特异性随机森林(CSRF)。
与我们之前研究余弦相似度PSM的研究相比,RF PSM产生了更好或相当的预测性能。RF和CSRF表现出最佳性能(根据受试者工作特征曲线下的平均面积[95%置信区间],RF:0.839[0.835 - 0.844];CSRF:0.832[0.821 - 0.843])。RF和CSRF并未通过使用RF PSM从个性化中获益,而其他模型则从中获益。
尽管RF PSM未能在RF和CSRF中提高性能,但它为几种预测模型带来了良好的死亡率预测性能。预测变量和相似性变量之间的区别是本研究中出现的一个重要问题。随机森林为患者特异性结局预测提供了一种有前景的方法。