Tepakhan Wanicha, Srisintorn Wisarut, Penglong Tipparat, Saelue Pirun
Department of Pathology, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla, Thailand.
Centre for Research and Development of Medical Diagnostic Laboratories, Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.
Sci Rep. 2025 May 15;15(1):16917. doi: 10.1038/s41598-025-01458-5.
Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.
基于红细胞指数的公式已被用于区分缺铁性贫血(IDA)和地中海贫血(Thal)。然而,它们的效率各不相同。在本研究中,我们旨在通过使用随机森林(RF)和梯度提升(GB)算法开发一种区分IDA和Thal的工具。收集了1143例贫血且平均红细胞体积低的患者的全血细胞计数数据(382例IDA患者,635例Thal患者,126例同时患有IDA和Thal的患者)。数据以80:20的比例随机分为训练集和测试集。RF和GB模型在训练集和测试集中对预测IDA和Thal具有良好的诊断性能。在用于预测二元结局的测试集中,GB和RF的准确率均为90.7%,受试者工作特征曲线下面积(AUC-ROC)为0.953。当纳入同时患有IDA和Thal的患者时,观察到较低的诊断性能。GB和RF的准确率分别为80.4%和82.2%,AUC-ROC值分别为0.910和0.899。总之,我们开发了一种使用GB算法的机器学习方法。该工具在地中海贫血和缺铁性贫血流行地区可能有用。