Medical Biology Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran.
Department of Animal Science, Ferdowsi University of Mashhad, Mashhad, Iran.
PLoS One. 2021 Jul 21;16(7):e0254976. doi: 10.1371/journal.pone.0254976. eCollection 2021.
This paper identifies prognosis factors for survival in patients with acute myeloid leukemia (AML) using machine learning techniques. We have integrated machine learning with feature selection methods and have compared their performances to identify the most suitable factors in assessing the survival of AML patients. Here, six data mining algorithms including Decision Tree, Random Forrest, Logistic Regression, Naive Bayes, W-Bayes Net, and Gradient Boosted Tree (GBT) are employed for the detection model and implemented using the common data mining tool RapidMiner and open-source R package. To improve the predictive ability of our model, a set of features were selected by employing multiple feature selection methods. The accuracy of classification was obtained using 10-fold cross-validation for the various combinations of the feature selection methods and machine learning algorithms. The performance of the models was assessed by various measurement indexes including accuracy, kappa, sensitivity, specificity, positive predictive value, negative predictive value, and area under the ROC curve (AUC). Our results showed that GBT with an accuracy of 85.17%, AUC of 0.930, and the feature selection via the Relief algorithm has the best performance in predicting the survival rate of AML patients.
本文使用机器学习技术确定急性髓系白血病 (AML) 患者的生存预后因素。我们将机器学习与特征选择方法相结合,并比较了它们的性能,以确定评估 AML 患者生存的最合适因素。在这里,我们使用了包括决策树、随机森林、逻辑回归、朴素贝叶斯、W-Bayes 网络和梯度提升树 (GBT) 在内的六种数据挖掘算法来构建检测模型,并使用常见的数据挖掘工具 RapidMiner 和开源 R 包来实现这些算法。为了提高我们模型的预测能力,我们通过使用多种特征选择方法来选择一组特征。通过对特征选择方法和机器学习算法的各种组合进行 10 折交叉验证,我们获得了分类的准确性。我们通过多种测量指标来评估模型的性能,包括准确性、kappa、敏感性、特异性、阳性预测值、阴性预测值和 ROC 曲线下的面积 (AUC)。我们的结果表明,在预测 AML 患者生存率方面,GBT 算法的准确率为 85.17%,AUC 为 0.930,并且通过 Relief 算法进行特征选择的效果最好。