Babu M, Sappani M, Joy M, Chandiraseharan V K, Jeyaseelan L, Sudarsanam T D
Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu, India.
Department of Medicine and Clinical Epidemiology Unit, Christian Medical College, Vellore, Tamil Nadu, India.
J Postgrad Med. 2024 Oct 1;70(4):209-216. doi: 10.4103/jpgm.jpgm_357_24. Epub 2024 Dec 6.
Machine learning (ML) has been tried in predicting outcomes following sepsis. This study aims to identify the utility of stacked ensemble algorithm in predicting mortality.
The study was a cohort of adults admitted to a medical unit of a tertiary care hospital with sepsis. The data were divided into a training data set (70%) and a test data set (30%). Boruta algorithm was used to identify important features. In the first phase of stacked ensemble model, weak learners such as random forest (RF), support vector machine (SVM), elastic net, and gradient boosting machine were trained. The SVM was used in phase 2 as meta learner to combine the results of all weak learners. All models were validated using test data.
In our cohort of 1,453 patients, the mortality rate was 27% (95% confidence interval [CI]: 25, 29). The Boruta algorithm identified inotrope use and assisted ventilation as the most important variables, which could predict mortality. The random forest outperforms (area under the curve [AUC]: 97.91%) the other algorithms. The AUCs for the other models are SVM (95.21%), GBM (93.67%), and GLM net (91.42%). However, the stacking of all the above models had an AUC of 92.14%. In the test data set, the accuracy of all methods including the RF method accuracy decreased (92.6 to 85.5%).
The random forest showed high accuracy in train and moderate accuracy in the test data. We suggest more regional open-access intensive care databases that can aid making machine learning a bigger support for healthcare personnel.
机器学习(ML)已被用于预测脓毒症后的结局。本研究旨在确定堆叠集成算法在预测死亡率方面的效用。
该研究纳入了一家三级护理医院内科收治的成年脓毒症患者队列。数据被分为训练数据集(70%)和测试数据集(30%)。使用Boruta算法识别重要特征。在堆叠集成模型的第一阶段,训练了随机森林(RF)、支持向量机(SVM)、弹性网和梯度提升机等弱学习器。在第二阶段,使用SVM作为元学习器来合并所有弱学习器的结果。所有模型均使用测试数据进行验证。
在我们的1453例患者队列中,死亡率为27%(95%置信区间[CI]:25,29)。Boruta算法确定血管活性药物的使用和辅助通气是预测死亡率的最重要变量。随机森林的表现优于其他算法(曲线下面积[AUC]:97.91%)。其他模型的AUC分别为SVM(95.21%)、梯度提升机(GBM,93.67%)和广义线性模型网(GLM net,91.42%)。然而,上述所有模型的堆叠AUC为92.14%。在测试数据集中,包括RF方法在内的所有方法的准确率均下降(从92.6%降至85.5%)。
随机森林在训练数据中显示出高准确率,在测试数据中显示出中等准确率。我们建议建立更多的区域开放获取重症监护数据库,以帮助机器学习为医护人员提供更大的支持。