Faculdade de Ciências Médicas de Minas Gerais. Fundação Lucas Machado. Belo Horizonte, MG, Brasil.
Instituto de Acreditação e Gestão em Saúde. Departamento de Ciências de Dados. Belo Horizonte, MG, Brasil.
Rev Saude Publica. 2024 Sep 16;58:41. doi: 10.11606/s1518-8787.2024058006161. eCollection 2024.
To develop and validate a predictive model utilizing machine-learning techniques for estimating the length of hospital stay among patients who underwent coronary artery bypass grafting.
Three machine learning models (random forest, extreme gradient boosting and neural networks) and three traditional regression models (Poisson regression, linear regression, negative binomial regression) were trained in a dataset of 9,584 patients who underwent coronary artery bypass grafting between January 2017 and December 2021. The data were collected from hospital discharges from 133 centers in Brazil. Algorithms were ranked by calculating the root mean squared logarithmic error (RMSLE). The top performing algorithm was validated in a never-before-seen database of 2,627 patients. We also developed a model with the top ten variables to improve usability.
The random forest technique produced the model with the lowest error. The RMLSE was 0.412 (95%CI 0.405-0.419) on the training dataset and 0.454 (95%CI 0.441-0.468) on the validation dataset. Non-elective surgery, admission to a public hospital, heart failure, and age had the greatest impact on length of hospital stay.
The predictive model can be used to generate length of hospital stay indices that could be used as markers of efficiency and identify patients with the potential for prolonged hospitalization, helping the institution in managing beds, scheduling surgeries, and allocating resources.
利用机器学习技术开发和验证一种预测模型,以估计接受冠状动脉旁路移植术的患者的住院时间。
在 2017 年 1 月至 2021 年 12 月期间,对来自巴西 133 个中心的 9584 例接受冠状动脉旁路移植术的患者的住院数据进行了 3 种机器学习模型(随机森林、极端梯度增强和神经网络)和 3 种传统回归模型(泊松回归、线性回归、负二项回归)的训练。算法通过计算均方根对数误差(RMSLE)进行排名。在从未见过的 2627 例患者数据库中验证表现最好的算法。我们还开发了一个包含前 10 个变量的模型以提高可用性。
随机森林技术产生的模型误差最低。在训练数据集上,RMSLE 为 0.412(95%CI 0.405-0.419),在验证数据集上为 0.454(95%CI 0.441-0.468)。择期手术、入住公立医院、心力衰竭和年龄对住院时间的影响最大。
该预测模型可用于生成住院时间指数,作为效率的指标,并识别可能需要延长住院时间的患者,有助于医院管理床位、安排手术和分配资源。