Omar Evi Diana, Mat Hasnah, Abd Karim Ainil Zafirah, Sanaudi Ridwan, Ibrahim Fairol H, Omar Mohd Azahadi, Ismail Muhd Zulfadli Hafiz, Jayaraj Vivek Jason, Goh Bak Leong
Sector for Biostatistics and Data Repository, National Institutes of Health, Ministry of Health Malaysia, Shah Alam, Selangor, Malaysia.
Hospital Sultan Idris Shah Serdang, Ministry of Health Malaysia, Kajang, Selangor, Malaysia.
Int J Nephrol Renovasc Dis. 2024 Jul 24;17:197-204. doi: 10.2147/IJNRD.S461028. eCollection 2024.
This study aimed to identify the best-performing algorithm for predicting Acute Kidney Injury (AKI) necessitating dialysis following cardiac surgery.
The dataset encompassed patient data from a tertiary cardiothoracic center in Malaysia between 2011 and 2015, sourced from electronic health records. Extensive preprocessing and feature selection ensured data quality and relevance. Four machine learning algorithms were applied: Logistic Regression, Gradient Boosted Trees, Support Vector Machine, and Random Forest. The dataset was split into training and validation sets and the hyperparameters were tuned. Accuracy, Area Under the ROC Curve (AUC), precision, F-measure, sensitivity, and specificity were some of the evaluation criteria. Ethical guidelines for data use and patient privacy were rigorously followed throughout the study.
With the highest accuracy (88.66%), AUC (94.61%), and sensitivity (91.30%), Gradient Boosted Trees emerged as the top performance. Random Forest displayed strong AUC (94.78%) and accuracy (87.39%). In contrast, the Support Vector Machine showed higher sensitivity (98.57%) with lower specificity (59.55%), but lower accuracy (79.02%) and precision (70.81%). Sensitivity (87.70%) and specificity (87.05%) were maintained in balance via Logistic Regression.
These findings imply that Gradient Boosted Trees and Random Forest might be an effective method for identifying patients who would develop AKI following heart surgery. However specific goals, sensitivity/specificity trade-offs, and consideration of the practical ramifications should all be considered when choosing an algorithm.
本研究旨在确定预测心脏手术后需要透析的急性肾损伤(AKI)的最佳算法。
数据集包含2011年至2015年马来西亚一家三级心胸中心的患者数据,数据来源于电子健康记录。进行了广泛的预处理和特征选择,以确保数据质量和相关性。应用了四种机器学习算法:逻辑回归、梯度提升树、支持向量机和随机森林。将数据集分为训练集和验证集,并对超参数进行了调整。评估标准包括准确率、ROC曲线下面积(AUC)、精确率、F值、灵敏度和特异性。在整个研究过程中严格遵循了数据使用和患者隐私的伦理准则。
梯度提升树以最高的准确率(88.66%)、AUC(94.61%)和灵敏度(91.30%)脱颖而出,表现最佳。随机森林显示出较高的AUC(94.78%)和准确率(87.39%)。相比之下,支持向量机的灵敏度较高(98.57%),但特异性较低(59.55%),准确率(79.02%)和精确率(70.81%)也较低。逻辑回归保持了灵敏度(87.70%)和特异性(87.05%)的平衡。
这些发现表明,梯度提升树和随机森林可能是识别心脏手术后可能发生AKI患者的有效方法。然而,在选择算法时,应考虑特定目标、灵敏度/特异性的权衡以及实际影响。