Caires Silveira Elena, Mattos Pretti Soraya, Santos Bruna Almeida, Santos Corrêa Caio Fellipe, Madureira Silva Leonardo, Freire de Melo Fabrício
Multidisciplinary Institute of Health, Federal University of Bahia, Vitória da Conquista 45-029094, Brazil.
World J Crit Care Med. 2022 Sep 9;11(5):317-329. doi: 10.5492/wjccm.v11.i5.317.
Intensive care unit (ICU) patients demand continuous monitoring of several clinical and laboratory parameters that directly influence their medical progress and the staff's decision-making. Those data are vital in the assistance of these patients, being already used by several scoring systems. In this context, machine learning approaches have been used for medical predictions based on clinical data, which includes patient outcomes.
To develop a binary classifier for the outcome of death in ICU patients based on clinical and laboratory parameters, a set formed by 1087 instances and 50 variables from ICU patients admitted to the emergency department was obtained in the "WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction" dataset.
For categorical variables, frequencies and risk ratios were calculated. Numerical variables were computed as means and standard deviations and Mann-Whitney tests were performed. We then divided the data into a training (80%) and test (20%) set. The training set was used to train a predictive model based on the Random Forest algorithm and the test set was used to evaluate the predictive effectiveness of the model.
A statistically significant association was identified between need for intubation, as well predominant systemic cardiovascular involvement, and hospital death. A number of the numerical variables analyzed (for instance Glasgow Coma Score punctuations, mean arterial pressure, temperature, pH, and lactate, creatinine, albumin and bilirubin values) were also significantly associated with death outcome. The proposed binary Random Forest classifier obtained on the test set ( = 218) had an accuracy of 80.28%, sensitivity of 81.82%, specificity of 79.43%, positive predictive value of 73.26%, negative predictive value of 84.85%, F1 score of 0.74, and area under the curve score of 0.85. The predictive variables of the greatest importance were the maximum and minimum lactate values, adding up to a predictive importance of 15.54%.
We demonstrated the efficacy of a Random Forest machine learning algorithm for handling clinical and laboratory data from patients under intensive monitoring. Therefore, we endorse the emerging notion that machine learning has great potential to provide us support to critically question existing methodologies, allowing improvements that reduce mortality.
重症监护病房(ICU)患者需要持续监测多个直接影响其医疗进展和医护人员决策的临床及实验室参数。这些数据对救治这些患者至关重要,已被多个评分系统所采用。在此背景下,机器学习方法已被用于基于临床数据(包括患者预后)的医学预测。
基于临床和实验室参数开发一种用于预测ICU患者死亡结局的二元分类器,从“2020年WiDS(数据科学女性)数据马拉松:ICU死亡率预测”数据集中获取了一个由1087个实例和50个变量组成的集合,这些实例和变量来自急诊科收治的ICU患者。
对于分类变量,计算频率和风险比。对数值变量计算均值和标准差,并进行曼-惠特尼检验。然后将数据分为训练集(80%)和测试集(20%)。训练集用于基于随机森林算法训练预测模型,测试集用于评估模型的预测有效性。
确定了插管需求以及主要的全身性心血管受累与医院死亡之间存在统计学显著关联。分析的一些数值变量(例如格拉斯哥昏迷评分、平均动脉压、体温、pH值以及乳酸、肌酐、白蛋白和胆红素值)也与死亡结局显著相关。在测试集(n = 218)上获得的所提出的二元随机森林分类器的准确率为80.28%,灵敏度为81.82%,特异性为79.43%,阳性预测值为73.26%,阴性预测值为84.85%,F1分数为0.74,曲线下面积分数为0.85。最重要的预测变量是乳酸的最大值和最小值,其预测重要性总计为15.54%。
我们证明了随机森林机器学习算法处理重症监护患者临床和实验室数据的有效性。因此,我们支持这一新兴观点,即机器学习有很大潜力为我们提供支持,以批判性地质疑现有方法,实现降低死亡率的改进。