Alshakhs Fatima, Alharthi Hana, Aslam Nida, Khan Irfan Ullah, Elasheri Mohamed
Department of Health Information Management & Technology, College of Public Health, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia.
Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia.
Int J Gen Med. 2020 Oct 2;13:751-762. doi: 10.2147/IJGM.S250334. eCollection 2020.
Predictive analytics (PA) is a new trending approach in the field of healthcare that uses machine learning to build a prediction model using supervised learning algorithms. Isolated coronary artery bypass grafting (iCABG), an open-heart surgery, is commonly performed in the treatment of coronary heart disease.
The aim of this study was to develop and evaluate a model to predict postoperative length of stay (PLoS) for iCABG patients using supervised machine learning techniques, and to identify the features with the highest contribution to the model.
This is a retrospective study that uses historic data of adult patients who underwent isolated CABG (iCABG). After initial data pre-processing, data imputation using the kNN method was applied. The study used five prediction models using Naïve Bayes, Decision Tree, Random Forest, Logistic Regression and k Nearest Neighbor algorithms. Data imbalance was managed using the following widely used methods: oversampling, undersampling, "Both", and random over-sampling examples (ROSE). The features selection process was conducted using the Boruta method. Two techniques were applied to examine the performance of the models, (70%, 30%) split and cross-validation, respectively. Models were evaluated by comparing their performance using AUC and other metrics.
In the final dataset, six distinct features and 621 instances were used to develop the models. A total of 20 models were developed using R statistical software. The model generated using Random Forest with "Both" resampling method and cross-validation technique was deemed the best fit (AUC=0.81; F1 score=0.82; and recall=0.82). Attributes found to be highly predictive of PLoS were pulmonary artery systolic, age, height, EuroScore II, intra-aortic balloon pump used, and complications during operation.
This study demonstrates the significance and effectiveness of building a model that predicts PLoS for iCABG patients using patient specifications and pre-/intra-operative measures.
预测分析(PA)是医疗保健领域一种新的流行方法,它利用机器学习通过监督学习算法构建预测模型。单纯冠状动脉旁路移植术(iCABG)是一种心脏直视手术,常用于治疗冠心病。
本研究的目的是使用监督机器学习技术开发并评估一个模型,以预测iCABG患者的术后住院时间(PLoS),并识别对该模型贡献最大的特征。
这是一项回顾性研究,使用接受单纯冠状动脉旁路移植术(iCABG)的成年患者的历史数据。在进行初始数据预处理后,应用了使用k近邻法(kNN)的数据插补。该研究使用了基于朴素贝叶斯、决策树、随机森林、逻辑回归和k近邻算法的五种预测模型。使用以下广泛使用的方法来处理数据不平衡问题:过采样、欠采样、“两者皆用”以及随机过采样示例(ROSE)。特征选择过程使用Boruta方法进行。分别应用两种技术来检验模型的性能,即(70%,30%)分割和交叉验证。通过比较模型使用AUC和其他指标的性能来对模型进行评估。
在最终数据集中,使用六个不同的特征和621个实例来开发模型。使用R统计软件共开发了20个模型。使用随机森林和“两者皆用”重采样方法以及交叉验证技术生成 的模型被认为是最佳拟合模型(AUC = 0.81;F1分数 = 0.82;召回率 = 0.82)。发现对PLoS具有高度预测性的属性包括肺动脉收缩压、年龄、身高、欧洲心脏手术风险评估系统II(EuroScore II)、是否使用主动脉内球囊泵以及手术期间的并发症。
本研究证明了使用患者特征以及术前/术中测量值构建预测iCABG患者PLoS的模型的重要性和有效性。