Zhang Xingyu, Kim Joyce, Patzer Rachel E, Pitts Stephen R, Patzer Aaron, Schrager Justin D
Justin D. Schrager, MD, MPH, Emory University School of Medicine, Department of Emergency Medicine, 531 Asbury Circle, Annex Building N340, Atlanta, GA 30322, USA, E-mail:
Methods Inf Med. 2017 Oct 26;56(5):377-389. doi: 10.3414/ME17-01-0024. Epub 2017 Aug 16.
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.
Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.
Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.
The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
描述和比较逻辑回归与神经网络建模策略,以预测患者首次到急诊科分诊后是否会住院或转院,同时探讨是否添加自然语言处理元素的影响。
利用2012年和2013年美国国家医院门诊医疗调查(NHAMCS)的数据,这是一个来自美国急诊科的横断面概率样本,我们开发了几个预测模型,结果为住院或转院与出院回家。我们纳入了患者到急诊科并完成分诊流程后立即可得的患者特征。我们利用这些信息构建逻辑回归(LR)和多层神经网络模型(MLNN),其中包括来自患者就诊原因的自然语言处理(NLP)和主成分分析。采用十折交叉验证来测试每个模型的预测能力,然后计算每个模型的受试者工作曲线(AUC)。
在来自642家医院的47200次急诊科就诊中,6335次(13.42%)导致住院(或转院)。NLP从就诊原因字段中总共提取了48个主成分,这些主成分解释了住院总体方差的75%。在仅包含结构化变量的模型中,逻辑回归的AUC为0.824(95%CI 0.818 - 0.830),MLNN的AUC为0.823(95%CI 0.817 - 0.829)。仅包含自由文本信息的模型中,逻辑回归的AUC为0.742(95%CI 0.731 - 0.753),MLNN的AUC为0.753(95%CI 0.742 - 0.764)。当同时包含结构化变量和自由文本变量时,逻辑回归的AUC达到0.846(95%CI 0.839 - 0.853),MLNN的AUC为0.844(95%CI 0.836 - 0.852)。
总体而言,对于到急诊科分诊的患者,预测其住院或转院的准确性良好,并且无论采用何种建模方法,纳入患者就诊原因的自由文本数据都能提高预测准确性。纳入患者报告结局自由文本的自然语言处理和神经网络可能会提高住院预测的准确性。