Kiremit Birgül Yabana, Şahin Durmuş Özkan
Department of Health Care Management Ondokuz Mayıs University Atakum, Samsun, Turkey.
Department of Computer Engineering Ondokuz Mayıs University Atakum, Samsun, Turkey.
Comput Biol Med. 2025 Sep;196(Pt B):110825. doi: 10.1016/j.compbiomed.2025.110825. Epub 2025 Aug 4.
The length of stay (LOS) for patients in hospitals is crucial for workforce planning, resource allocation, and bed capacity management, impacting healthcare costs, future needs and financial planning. This study focuses on calculating the LOS for Chronic Kidney Disease (CKD) patients admitted as inpatients and estimating their hospital bills based on services rendered during their stay. Utilizing data from 5,583 CKD patients and 11 input variables, various machine learning (ML) algorithms were applied to develop regression, and classification models. To optimize the model performance and address potential overfitting issues, feature selection techniques were also employed. The Random Forest (RF) algorithm achieved the highest performance for bill amount estimation, with a Correlation Coefficient (CC) of 0.736. The algorithms predicting LOS showed even more promising results, with all performing above 0.848 on the CC metric. The best performances were obtained from Support Vector Machine (SVM), M5P trees and RF with Mean Absolute Error (MAE) and CC results of 2.580 day-0.875, 2.587 day-0.880 and 2.611 day-0.880, respectively. LOS was categorized as short or long using ML algorithms, with Logistic Regression (LogR) achieving the best classification results: 0.944 on the AUC-ROC (Area Under the ROC Curve) metric and 0.872 on the F-Measure metric. The RF algorithm also excelled in classification based on patient units, producing results of 0.788 on the AUC-ROC and 0.863 for accuracy. Additionally with feature selection revealed that reducing input variables maintained prediction accuracy for bill amount and LOS, but it generally negatively affected classification performance. Feature selection was identified as a critical challenge, particularly in balancing the trade-off between dimensionality reduction and predictive accuracy. While dimensionality reduction can improve computational efficiency, careful selection of input variables is essential to maintain robust classification performance. Given the lengthy treatment processes for CKD patients, accurate predictions of LOS, billing amounts, and admission units will assist health managers in planning for future resource needs, such as medical supplies and workforce. Ultimately, this study provides insights that can enhance the financial sustainability and management of healthcare services.
患者的住院时长对于劳动力规划、资源分配和床位容量管理至关重要,会影响医疗成本、未来需求和财务规划。本研究着重计算慢性肾脏病(CKD)住院患者的住院时长,并根据其住院期间接受的服务估算住院费用。利用来自5583名CKD患者的数据和11个输入变量,应用了各种机器学习(ML)算法来开发回归模型和分类模型。为了优化模型性能并解决潜在的过拟合问题,还采用了特征选择技术。随机森林(RF)算法在账单金额估计方面表现最佳,相关系数(CC)为0.736。预测住院时长的算法显示出更有前景的结果,所有算法在CC指标上的表现均高于0.848。支持向量机(SVM)、M5P树和RF取得了最佳性能,平均绝对误差(MAE)和CC结果分别为2.580天 - 0.875、2.587天 - 0.880和2.611天 - 0.880。使用ML算法将住院时长分为短或长,逻辑回归(LogR)取得了最佳分类结果:在AUC - ROC(ROC曲线下面积)指标上为0.944,在F - 度量指标上为0.872。RF算法在基于患者单元的分类方面也表现出色,在AUC - ROC上的结果为0.788,准确率为0.863。此外,特征选择表明减少输入变量可维持账单金额和住院时长的预测准确性,但通常会对分类性能产生负面影响。特征选择被确定为一项关键挑战,尤其是在平衡降维和预测准确性之间的权衡时。虽然降维可以提高计算效率,但仔细选择输入变量对于维持稳健的分类性能至关重要。鉴于CKD患者的治疗过程漫长,准确预测住院时长、账单金额和入院单元将有助于卫生管理人员规划未来的资源需求,如医疗用品和劳动力。最终,本研究提供的见解可增强医疗服务的财务可持续性和管理水平。