Symum Hasan, Zayas-Castro José L
Department of Industrial and Management System Engineering, University of South Florida, Tampa, FL, USA.
College of Engineering, University of South Florida, Tampa, FL, USA.
Healthc Inform Res. 2020 Jan;26(1):20-33. doi: 10.4258/hir.2020.26.1.20. Epub 2020 Jan 31.
The study aimed to develop and compare predictive models based on supervised machine learning algorithms for predicting the prolonged length of stay (LOS) of hospitalized patients diagnosed with five different chronic conditions.
An administrative claim dataset (2008-2012) of a regional network of nine hospitals in the Tampa Bay area, Florida, USA, was used to develop the prediction models. Features were extracted from the dataset using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes. Five learning algorithms, namely, decision tree C5.0, linear support vector machine (LSVM), k-nearest neighbors, random forest, and multi-layered artificial neural networks, were used to build the model with semi-supervised anomaly detection and two feature selection methods. Issues with the unbalanced nature of the dataset were resolved using the Synthetic Minority Over-sampling Technique (SMOTE).
LSVM with wrapper feature selection performed moderately well for all patient cohorts. Using SMOTE to counter data imbalances triggered a tradeoff between the model's sensitivity and specificity, which can be masked under a similar area under the curve. The proposed aggregate rank selection approach resulted in a balanced performing model compared to other criteria. Finally, factors such as comorbidity conditions, source of admission, and payer types were associated with the increased risk of a prolonged LOS.
Prolonged LOS is mostly associated with pre-intraoperative clinical and patient socioeconomic factors. Accurate patient identification with the risk of prolonged LOS using the selected model can provide hospitals a better tool for planning early discharge and resource allocation, thus reducing avoidable hospitalization costs.
本研究旨在开发并比较基于监督式机器学习算法的预测模型,以预测被诊断患有五种不同慢性病的住院患者的延长住院时间(LOS)。
使用美国佛罗里达州坦帕湾地区九家医院区域网络的管理索赔数据集(2008 - 2012年)来开发预测模型。使用国际疾病分类第九版临床修订本(ICD - 9 - CM)编码从数据集中提取特征。使用决策树C5.0、线性支持向量机(LSVM)、k近邻、随机森林和多层人工神经网络这五种学习算法,通过半监督异常检测和两种特征选择方法构建模型。使用合成少数过采样技术(SMOTE)解决数据集不平衡的问题。
带有包装器特征选择的LSVM对所有患者队列的表现中等良好。使用SMOTE来应对数据不平衡会在模型的敏感性和特异性之间引发权衡,这在曲线下面积相似的情况下可能会被掩盖。与其他标准相比,所提出的综合排名选择方法产生了一个性能平衡的模型。最后,合并症情况(共病情况)、入院来源和付款人类型等因素与延长LOS的风险增加相关。
延长的LOS主要与术前临床和患者社会经济因素相关。使用所选模型准确识别有延长LOS风险的患者可为医院提供更好的工具来规划早期出院和资源分配,从而降低可避免的住院成本。