Division of Pulmonary, Allergy, and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
Crit Care Med. 2018 Jul;46(7):1125-1132. doi: 10.1097/CCM.0000000000003148.
Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization.
Retrospective cohort study with split sampling for model training and testing.
A single urban academic hospital.
All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012.
None.
Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them.
Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.
早期预测新住院患者的不良结局可以改善患者分诊,并促使医护人员与患者讨论其医疗照护目标。我们评估了逻辑回归、梯度提升机、随机森林和弹性网络回归模型的性能,这些模型分别使用和不使用非结构化临床文本数据,以预测住院期间死亡或 ICU 住院时间大于或等于 7 天的二元复合结局,数据来源于患者入院后前 48 小时。
回顾性队列研究,采用拆分样本进行模型训练和测试。
一家位于马萨诸塞州波士顿的单一城市学术医院。
所有在马萨诸塞州波士顿贝斯以色列女执事医疗中心需要 ICU 护理的住院患者,纳入时间为 2001 年至 2012 年。
无。
在 25947 例符合条件的住院患者中,我们观察到 5504 例(21.2%)患者死亡或 ICU 住院时间大于或等于 7 天。梯度提升机模型在不包含(接受者操作特征曲线下面积,0.83;95%置信区间,0.81-0.84)和包含(接受者操作特征曲线下面积,0.89;95%置信区间,0.88-0.90)文本衍生变量的情况下均具有最高的判别能力。梯度提升机和随机森林在不包含文本数据的情况下均优于逻辑回归(p<0.001),而所有模型在包含文本数据的情况下均优于逻辑回归(p<0.02)。纳入文本数据后,所有四种模型类型的判别能力均有所提高(p<0.001)。在使用文本数据的模型中,术语“插管”和“预后不良”的出现频率增加与死亡率和 ICU 住院时间呈正相关,而术语“拔管”与它们呈负相关。
使用自然语言处理技术从患者入院后前 48 小时的非结构化临床文本中提取的变量显著提高了逻辑回归和其他机器学习模型预测患者死亡或 ICU 住院时间延长的能力。学习型医疗系统可以采用这种基于开放源代码的方法来适应模型,以捕捉医疗照护模式的局部差异。