School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia.
Comput Intell Neurosci. 2022 Aug 30;2022:9517029. doi: 10.1155/2022/9517029. eCollection 2022.
The influx of hospital patients has become common in recent years. Hospital management departments need to redeploy healthcare resources to meet the massive medical needs of patients. In this process, the hospital length of stay (LOS) of different patients is a crucial reference to the management department. Therefore, building a model to predict LOS is of great significance. Five machine learning (ML) algorithms named Lasso regression (LR), ridge regression (RR), random forest regression (RFR), light gradient boosting machine (LightGBM), and extreme gradient boosting regression (XGBR) and six feature encoding methods named label encoding, count encoding, one-hot encoding, target encoding, leave-one-out encoding, and the proposed encoding method are used to construct the regression prediction model. The Scikit-Learn toolbox on the Python platform builds the prediction model. The input is the dataset named Hospital Inpatient Discharges (SPARCS De-Identified) 2017 with 2343569 instances provided by the New York State Department of Health verify the model after removing 2.2% of the missing data, and the model ultimately uses mean squared error (MSE) and coefficient of determination ( ) as the performance measurement. The results show that the model with the LightGBM algorithm and the proposed encoding method has the best (96.0%) and MSE score (2.231).
近年来,医院患者人数的涌入已变得很常见。医院管理部门需要重新部署医疗资源,以满足大量患者的医疗需求。在这个过程中,不同患者的医院住院时间(LOS)是管理部门的重要参考。因此,建立一个预测 LOS 的模型具有重要意义。本文使用了五种机器学习(ML)算法,分别是 Lasso 回归(LR)、岭回归(RR)、随机森林回归(RFR)、轻梯度提升机(LightGBM)和极端梯度提升回归(XGBR),以及六种特征编码方法,分别是标签编码、计数编码、独热编码、目标编码、留一法编码和提出的编码方法,来构建回归预测模型。该模型使用 Python 平台上的 Scikit-Learn 工具箱构建。输入数据集名为 Hospital Inpatient Discharges (SPARCS De-Identified) 2017,包含 2343569 个实例,由纽约州卫生部提供。在删除 2.2%的缺失数据后,对模型进行验证,最终模型使用均方误差(MSE)和决定系数( )作为性能度量。结果表明,使用 LightGBM 算法和提出的编码方法的模型具有最佳的 (96.0%)和 MSE 得分(2.231)。