Department of Health Information Technology, Faculty of Paramedical, Ilam University of Medical Sciences, Ilam, Iran.
Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):236. doi: 10.1186/s12911-022-01980-w.
Chronic myeloid leukemia (CML) is a myeloproliferative disorder resulting from the translocation of chromosomes 19 and 22. CML includes 15-20% of all cases of leukemia. Although bone marrow transplant and, more recently, tyrosine kinase inhibitors (TKIs) as a first-line treatment have significantly prolonged survival in CML patients, accurate prediction using available patient-level factors can be challenging. We intended to predict 5-year survival among CML patients via eight machine learning (ML) algorithms and compare their performance.
The data of 837 CML patients were retrospectively extracted and randomly split into training and test segments (70:30 ratio). The outcome variable was 5-year survival with potential values of alive or deceased. The dataset for the full features and important features selected by minimal redundancy maximal relevance (mRMR) feature selection were fed into eight ML techniques, including eXtreme gradient boosting (XGBoost), multilayer perceptron (MLP), pattern recognition network, k-nearest neighborhood (KNN), probabilistic neural network, support vector machine (SVM) (kernel = linear), SVM (kernel = RBF), and J-48. The scikit-learn library in Python was used to implement the models. Finally, the performance of the developed models was measured using some evaluation criteria with 95% confidence intervals (CI).
Spleen palpable, age, and unexplained hemorrhage were identified as the top three effective features affecting CML 5-year survival. The performance of ML models using the selected-features was superior to that of the full-features dataset. Among the eight ML algorithms, SVM (kernel = RBF) had the best performance in tenfold cross-validation with an accuracy of 85.7%, specificity of 85%, sensitivity of 86%, F-measure of 87%, kappa statistic of 86.1%, and area under the curve (AUC) of 85% for the selected-features. Using the full-features dataset yielded an accuracy of 69.7%, specificity of 69.1%, sensitivity of 71.3%, F-measure of 72%, kappa statistic of 75.2%, and AUC of 70.1%.
Accurate prediction of the survival likelihood of CML patients can inform caregivers to promote patient prognostication and choose the best possible treatment path. While external validation is required, our developed models will offer customized treatment and may guide the prescription of personalized medicine for CML patients.
慢性髓性白血病(CML)是一种骨髓增生性疾病,源于染色体 19 和 22 的易位。CML 占所有白血病病例的 15-20%。尽管骨髓移植和最近的酪氨酸激酶抑制剂(TKI)作为一线治疗方法显著延长了 CML 患者的生存时间,但使用现有患者水平的因素进行准确预测可能具有挑战性。我们旨在通过 8 种机器学习(ML)算法预测 CML 患者的 5 年生存率,并比较它们的性能。
回顾性提取 837 例 CML 患者的数据,并将其随机分为训练和测试段(70:30 比例)。结局变量为 5 年生存率,可能的值为存活或死亡。将全特征数据集和通过最小冗余最大相关性(mRMR)特征选择选择的重要特征数据集输入到 8 种 ML 技术中,包括极端梯度增强(XGBoost)、多层感知机(MLP)、模式识别网络、k-最近邻(KNN)、概率神经网络、支持向量机(SVM)(核=线性)、SVM(核=RBF)和 J-48。使用 Python 中的 scikit-learn 库实现模型。最后,使用 95%置信区间(CI)的一些评估标准来衡量开发模型的性能。
脾肿大、年龄和不明原因出血被确定为影响 CML 5 年生存率的前三个有效特征。使用选定特征的 ML 模型的性能优于全特征数据集。在 8 种 ML 算法中,SVM(核=RBF)在 10 倍交叉验证中的表现最佳,准确率为 85.7%,特异性为 85%,灵敏度为 86%,F 度量为 87%,kappa 统计量为 86.1%,选定特征的曲线下面积(AUC)为 85%。使用全特征数据集的准确率为 69.7%,特异性为 69.1%,灵敏度为 71.3%,F 度量为 72%,kappa 统计量为 75.2%,AUC 为 70.1%。
准确预测 CML 患者的生存可能性可以为护理人员提供信息,以促进患者预后判断,并选择最佳的治疗途径。虽然需要外部验证,但我们开发的模型将提供定制的治疗,并可能指导 CML 患者的个性化药物处方。