Department of Industrial and Systems Engineering, University of Southern California (USC), 3650 McClintock Ave, Los Angeles, CA, 90089, United States of America.
Department of Health Science, Long Beach (CSULB), California State University, 1250 Bellflower Blvd, Long Beach, CA, 90840, United States of America.
BMC Med Inform Decis Mak. 2024 Aug 8;24(1):223. doi: 10.1186/s12911-024-02629-6.
There is a growing demand for advanced methods to improve the understanding and prediction of illnesses. This study focuses on Sepsis, a critical response to infection, aiming to enhance early detection and mortality prediction for Sepsis-3 patients to improve hospital resource allocation.
In this study, we developed a Machine Learning (ML) framework to predict the 30-day mortality rate of ICU patients with Sepsis-3 using the MIMIC-III database. Advanced big data extraction tools like Snowflake were used to identify eligible patients. Decision tree models and Entropy Analyses helped refine feature selection, resulting in 30 relevant features curated with clinical experts. We employed the Light Gradient Boosting Machine (LightGBM) model for its efficiency and predictive power.
The study comprised a cohort of 9118 Sepsis-3 patients. Our preprocessing techniques significantly improved both the AUC and accuracy metrics. The LightGBM model achieved an impressive AUC of 0.983 (95% CI: [0.980-0.990]), an accuracy of 0.966, and an F1-score of 0.910. Notably, LightGBM showed a substantial 6% improvement over our best baseline model and a 14% enhancement over the best existing literature. These advancements are attributed to (I) the inclusion of the novel and pivotal feature Hospital Length of Stay (HOSP_LOS), absent in previous studies, and (II) LightGBM's gradient boosting architecture, enabling robust predictions with high-dimensional data while maintaining computational efficiency, as demonstrated by its learning curve.
Our preprocessing methodology reduced the number of relevant features and identified a crucial feature overlooked in previous studies. The proposed model demonstrated high predictive power and generalization capability, highlighting the potential of ML in ICU settings. This model can streamline ICU resource allocation and provide tailored interventions for Sepsis-3 patients.
人们对提高对疾病的理解和预测的先进方法的需求日益增长。本研究关注败血症,这是对感染的严重反应,旨在提高败血症-3 患者的早期检测和死亡率预测,以改善医院资源分配。
在这项研究中,我们开发了一个机器学习 (ML) 框架,使用 MIMIC-III 数据库来预测 ICU 败血症-3 患者的 30 天死亡率。先进的大数据提取工具,如 Snowflake,用于识别合格的患者。决策树模型和熵分析有助于精炼特征选择,从而由临床专家精心挑选出 30 个相关特征。我们采用 Light Gradient Boosting Machine (LightGBM) 模型,因其效率和预测能力。
该研究包括 9118 名败血症-3 患者的队列。我们的预处理技术显著提高了 AUC 和准确性指标。LightGBM 模型的 AUC 达到了 0.983(95%置信区间:[0.980-0.990]),准确率为 0.966,F1 得分为 0.910。值得注意的是,LightGBM 相对于我们最好的基线模型提高了 6%,相对于现有最好文献提高了 14%。这些改进归因于 (I) 包括新颖且关键的特征医院住院时间(HOSP_LOS),这在以前的研究中是没有的,以及 (II) LightGBM 的梯度提升架构,能够在保持计算效率的同时,对高维数据进行稳健预测,这一点可以从它的学习曲线中得到证明。
我们的预处理方法减少了相关特征的数量,并确定了以前研究中忽略的关键特征。所提出的模型表现出了很高的预测能力和泛化能力,突出了机器学习在 ICU 环境中的潜力。该模型可以简化 ICU 资源分配,并为败血症-3 患者提供个性化干预。