Baxter Sally L, Klie Adam R, Saseendrakumar Bharanidharan Radha, Ye Gordon Y, Hogarth Michael, Nemati Shamim
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5459-5463. doi: 10.1109/EMBC44109.2020.9175287.
Fungemia is a life-threatening infection, but predictive models of in-patient mortality in this infection are few. In this study, we developed models predicting all-cause in-hospital mortality among 265 fungemic patients in the Medical Information Mart for Intensive Care (MIMIC-III) database using both structured and unstructured data. Structured data models included multivariable logistic regression, extreme gradient boosting, and stacked ensemble models. Unstructured data models were developed using Amazon Comprehend Medical and BioWordVec embeddings in logistic regression, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). We evaluated models trained on all notes, notes from only the first three days of hospitalization, and models trained on only physician notes. The best-performing structured data model was a multivariable logistic regression model that achieved an accuracy of 0.74 and AUC of 0.76. Liver disease, acute renal failure, and intubation were some of the top features driving prediction in multiple models. CNNs using unstructured data achieved similar performance even when trained with notes from only the first three days of hospitalization. The best-performing unstructured data models used the Amazon Comprehend Medical document classifier and CNNs, achieving accuracy ranging from 0.99-1.00, and AUCs of 1.00. Therefore, unstructured data - particularly notes composed by physicians - offer added predictive value over models based on structured data alone.
真菌血症是一种危及生命的感染,但针对这种感染的住院死亡率预测模型却很少。在本研究中,我们利用结构化和非结构化数据,开发了预测重症监护医学信息集市(MIMIC-III)数据库中265例真菌血症患者全因院内死亡率的模型。结构化数据模型包括多变量逻辑回归、极端梯度提升和堆叠集成模型。非结构化数据模型是在逻辑回归、卷积神经网络(CNN)和循环神经网络(RNN)中使用亚马逊医疗理解和生物词向量嵌入开发的。我们评估了在所有病历、仅住院前三天的病历上训练的模型,以及仅在医生病历上训练的模型。表现最佳的结构化数据模型是一个多变量逻辑回归模型,其准确率达到0.74,曲线下面积(AUC)为0.76。肝病、急性肾衰竭和插管是多个模型中推动预测的一些主要特征。即使仅使用住院前三天的病历进行训练,使用非结构化数据的CNN也能达到类似的性能。表现最佳的非结构化数据模型使用了亚马逊医疗理解文档分类器和CNN,准确率在0.99至1.00之间,AUC为1.00。因此,非结构化数据——尤其是医生撰写的病历——比仅基于结构化数据的模型具有更高的预测价值。