AlMuhaideb Sarab, Bin Shawyah Alanoud, Alhamid Mohammed F, Alabbad Arwa, Alabbad Maram, Alsergani Hani, Alswailem Osama
Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 266, Riyadh 11362, Saudi Arabia.
Healthcare Information Technology Affairs (HITA), King Faisal Specialist Hospital & Research Center, P.O. Box 3354, Riyadh 11211, Saudi Arabia.
Healthcare (Basel). 2024 May 29;12(11):1110. doi: 10.3390/healthcare12111110.
Efficient management of hospital resources is essential for providing high-quality healthcare while ensuring sustainability. Length of stay (LOS), measuring the duration from admission to discharge, directly impacts patient outcomes and resource utilization. Accurate LOS prediction offers numerous benefits, including reducing re-admissions, ensuring appropriate staffing, and facilitating informed discharge planning. While conventional methods rely on statistical models and clinical expertise, recent advances in machine learning (ML) present promising avenues for enhancing LOS prediction. This research focuses on developing an ML-based LOS prediction model trained on a comprehensive real-world dataset and discussing the important factors towards practical deployment of trained ML models in clinical settings. This research involves the development of a comprehensive adult cardiac patient dataset (SaudiCardioStay (SCS)) from the King Faisal Specialist Hospital & Research Centre (KFSH&RC) hospital in Saudi Arabia, comprising 4930 patient encounters for 3611 unique patients collected from 2019 to 2022 (excluding 2020). A diverse range of classical ML models (i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), artificial neural networks (ANNs), Average Voting Regression (AvgVotReg)) are implemented for the SCS dataset to explore the potential of existing ML models in LOS prediction. In addition, this study introduces a novel approach for LOS prediction by incorporating a dedicated LOS classifier within a sophisticated ensemble methodology (i.e., Two-Level Sequential Cascade Generalization (2LSCG), Three-Level Sequential Cascade Generalization (3LSCG), Parallel Cascade Generalization (PCG)), aiming to enhance prediction accuracy and capture nuanced patterns in healthcare data. The experimental results indicate the best mean absolute error (MAE) of 0.1700 for the 3LSCG model. Relatively comparable performance was observed for the AvgVotReg model, with a MAE of 0.1703. In the end, a detailed analysis of the practical implications, limitations, and recommendations concerning the deployment of ML approaches in actual clinical settings is presented.
医院资源的有效管理对于提供高质量医疗服务并确保可持续性至关重要。住院时间(LOS)衡量从入院到出院的时长,直接影响患者预后和资源利用。准确的住院时间预测具有诸多益处,包括减少再次入院、确保合理的人员配备以及促进明智的出院计划制定。虽然传统方法依赖统计模型和临床专业知识,但机器学习(ML)的最新进展为提高住院时间预测提供了有前景的途径。本研究专注于开发一个基于机器学习的住院时间预测模型,该模型在一个全面的真实世界数据集上进行训练,并讨论在临床环境中实际部署训练好的机器学习模型的重要因素。本研究涉及从沙特阿拉伯法赫德国王专科医院及研究中心(KFSH&RC)开发一个全面的成年心脏病患者数据集(沙特心脏病住院数据集(SCS)),该数据集包含从2019年到2022年(不包括2020年)收集的3611名独特患者的4930次患者就诊记录。针对SCS数据集实现了多种经典机器学习模型(即随机森林(RF)、极端梯度提升(XGBoost)、轻量级梯度提升机(LGBM)、人工神经网络(ANNs)、平均投票回归(AvgVotReg)),以探索现有机器学习模型在住院时间预测中的潜力。此外,本研究通过在一种复杂的集成方法(即两级顺序级联泛化(2LSCG)、三级顺序级联泛化(3LSCG)、并行级联泛化(PCG))中纳入一个专门的住院时间分类器,引入了一种新的住院时间预测方法,旨在提高预测准确性并捕捉医疗数据中的细微模式。实验结果表明,3LSCG模型的最佳平均绝对误差(MAE)为0.1700。AvgVotReg模型的表现相对可比,MAE为0.1703。最后,对在实际临床环境中部署机器学习方法的实际意义、局限性和建议进行了详细分析。