Karajizadeh Mehrdad, Nasiri Mahdi, Yadollahi Mahnaz, Zolfaghari Amir Hussain, Pakdam Ali
School of Management & Information Sciences, Shiraz University of Medical Sciences, Shiraz, Iran.
Trauma Research Center, Shahid Rajaee (Emtiaz) Trauma Hospital, Shiraz University of Medical Sciences, Shiraz, Iran.
Healthc Inform Res. 2020 Oct;26(4):284-294. doi: 10.4258/hir.2020.26.4.284. Epub 2020 Oct 31.
Machine learning has been widely used to predict diseases, and it is used to derive impressive knowledge in the healthcare domain. Our objective was to predict in-hospital mortality from hospital-acquired infections in trauma patients on an unbalanced dataset.
Our study was a cross-sectional analysis on trauma patients with hospital-acquired infections who were admitted to Shiraz Trauma Hospital from March 20, 2017, to March 21, 2018. The study data was obtained from the surveillance hospital infection database. The data included sex, age, mechanism of injury, body region injured, severity score, type of intervention, infection day after admission, and microorganism causes of infections. We developed our mortality prediction model by random under-sampling, random over-sampling, clustering (k-mean)-C5.0, SMOTE-C5.0, ADASYN-C5.5, SMOTE-SVM, ADASYN-SVM, SMOTE-ANN, and ADASYN-ANN among hospital-acquired infections in trauma patients. All mortality predictions were conducted by IBM SPSS Modeler 18.
We studied 549 individuals with hospital-acquired infections in a trauma hospital in Shiraz during 2017 and 2018. Prediction accuracy before balancing of the dataset was 86.16%. In contrast, the prediction accuracy for the balanced dataset achieved by random under-sampling, random over-sampling, clustering (k-mean)-C5.0, SMOTE-C5.0, ADASYN-C5.5, and SMOTE-SVM was 70.69%, 94.74%, 93.02%, 93.66%, 90.93%, and 100%, respectively.
Our findings demonstrate that cleaning an unbalanced dataset increases the accuracy of the classification model. Also, predicting mortality by a clustered under-sampling approach was more precise in comparison to random under-sampling and random over-sampling methods.
机器学习已被广泛用于疾病预测,并在医疗领域获得了令人瞩目的知识。我们的目标是在一个不平衡数据集上预测创伤患者医院获得性感染后的院内死亡率。
我们的研究是对2017年3月20日至2018年3月21日入住设拉子创伤医院的医院获得性感染创伤患者进行的横断面分析。研究数据来自医院感染监测数据库。数据包括性别、年龄、损伤机制、受伤身体部位、严重程度评分、干预类型、入院后感染日期以及感染的微生物原因。我们通过随机欠采样、随机过采样、聚类(k均值)-C5.0、SMOTE-C5.0、ADASYN-C5.5、SMOTE-SVM、ADASYN-SVM、SMOTE-ANN和ADASYN-ANN在创伤患者医院获得性感染中开发了死亡率预测模型。所有死亡率预测均由IBM SPSS Modeler 18进行。
我们研究了2017年和2018年设拉子一家创伤医院的549例医院获得性感染患者。数据集平衡前的预测准确率为86.16%。相比之下,通过随机欠采样、随机过采样、聚类(k均值)-C5.0、SMOTE-C5.0、ADASYN-C5.5和SMOTE-SVM实现的平衡数据集的预测准确率分别为70.69%、94.74%、93.02%、93.66%、90.93%和100%。
我们的研究结果表明,清理不平衡数据集可提高分类模型的准确性。此外,与随机欠采样和随机过采样方法相比,通过聚类欠采样方法预测死亡率更为精确。