Wang WenQiang, Ye RenQing, Tang BaoJia, Qi YuYing
Department of Clinical Laboratory, Ningde Municipal Hospital of Ningde Normal University, Ningde, China.
Department of Clinical Laboratory, Ningde Municipal Hospital of Ningde Normal University, Ningde, China.
Clin Chim Acta. 2025 Feb 1;567:120025. doi: 10.1016/j.cca.2024.120025. Epub 2024 Nov 7.
The differential diagnosis between iron deficiency anemia (IDA) and thalassemia trait (TT) remains a significant clinical challenge. This study aimed to develop a machine learning-based multi-class model to differentiate among Microcytic-TT(TT with low mean corpuscular volume), Normocytic-TT (TT with normal mean corpuscular volume), IDA, and healthy individuals.
A comprehensive dataset comprising 1,819 individuals was analyzed using six distinct machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) algorithm was ultimately selected to construct the MultiThal-Classifier (M-THAL) model. SMOTENC (Synthetic Minority Over-sampling Technique for Nominal and Continuous features) was employed to address data imbalance. Model performance was evaluated using various metrics, and SHAP values were applied to interpret the model's predictions.Additionally, external validation was conducted to assess the model's robustness and generalizability.
After performing 1000 bootstrap resamples of the test set, the average performance metrics of M-THAL and the 95 % confidence interval(CI) were as follows, sensitivity 90.27 % (95 % CI: 84.88-95.26), specificity 97.87 % (95% CI: 97.10-98.55), PPV 93.42 % (95 % CI: 89.34-96.48), NPV 97.82% (95 % CI: 97.00-98.53), F1-score 91.50 % (95% CI: 87.29-95.34), Youden's index 88.15 % (95 % CI: 82.33-93.70), accuracy 97.06 % (95% CI: 96.06-97.99), and AUC 94.07 % (95 % CI: 91.17-96.84).Feature importance analysis identified mean corpuscular volume(MCV), mean corpuscular hemoglobin(MCH), red cell distribution width - standard deviation(RDW-SD), and hemoglobin (HGB) were identified as the most important features. External validation confirmed the model's robustness and generalizability.
The M-THAL effectively distinguishes Normocytic-TT, Microcytic-TT, IDA, and healthy individuals using hematological parameters, offers a rapid and cost-effective screening tool that can be readily implemented in diverse healthcare settings.
缺铁性贫血(IDA)与地中海贫血特征(TT)的鉴别诊断仍然是一项重大的临床挑战。本研究旨在开发一种基于机器学习的多分类模型,以区分小细胞性TT(平均红细胞体积低的TT)、正细胞性TT(平均红细胞体积正常的TT)、IDA和健康个体。
使用六种不同的机器学习算法对包含1819名个体的综合数据集进行分析。最终选择极端梯度提升(XGBoost)算法构建多地中海贫血分类器(M-THAL)模型。采用SMOTENC(名义和连续特征的合成少数过采样技术)来解决数据不平衡问题。使用各种指标评估模型性能,并应用SHAP值来解释模型的预测。此外,进行外部验证以评估模型的稳健性和泛化能力。
在对测试集进行1000次自助重采样后,M-THAL的平均性能指标及95%置信区间(CI)如下:灵敏度90.27%(95%CI:84.88-95.26),特异性97.87%(95%CI:97.10-98.55),阳性预测值93.42%(95%CI:89.