使用机器学习模型对α地中海贫血数据进行分类。

Classification of α-thalassemia data using machine learning models.

作者信息

Christensen Frederik, Kılıç Deniz Kenan, Nielsen Izabela Ewa, El-Galaly Tarec Christoffer, Glenthøj Andreas, Helby Jens, Frederiksen Henrik, Möller Sören, Fuglkjær Alexander Djupnes

机构信息

Operations Research Group, Department of Materials and Production, Aalborg University, Aalborg, 9220, Denmark.

出版信息

Comput Methods Programs Biomed. 2025 Mar;260:108581. doi: 10.1016/j.cmpb.2024.108581. Epub 2025 Jan 6.

DOI:10.1016/j.cmpb.2024.108581

PMID:39798280

Abstract

BACKGROUND

Around 7% of the global population has congenital hemoglobin disorders, with over 300,000 new cases of α-thalassemia annually. Diagnosis is costly and inaccurate in low-income regions, often relying on complete blood count (CBC) tests. This study employs machine learning (ML) to classify α-thalassemia traits based on gender and CBC, exploring the effects of grouping silent- and non-carriers.

METHODS

The dataset includes 288 individuals with suspected α-thalassemia from Sri Lanka. It was classified using eleven discriminant formulae and nine ML models. Outliers were removed using Mahalanobis distance, and resampling was conducted with the synthetic minority oversampling technique (SMOTE) and SMOTE-nominal continuous (NC). The Mann-Whitney U test handled feature extraction and class grouping. ML performance was evaluated with eight criteria.

RESULTS

The Ehsani formula achieved an area under the receiver operating characteristic curve (ROC-AUC) of 0.66 by grouping silent- and non-carriers. The convolutional neural network (CNN) without feature extraction demonstrated better performance, with an accuracy of 0.85, sensitivity of 0.8, specificity of 0.86, and ROC-AUC of 0.95/0.93 (micro/macro). Performance was maintained even without preprocessing.

CONCLUSION

ML models outperformed classical discriminant formulae in classifying α-thalassemia using sex and CBC features. A larger dataset could enhance ML model generalization and the impact of feature extraction. Grouping silent- and non-carriers improved ML results, especially with resampling. The silent carriers were not separable from non-carriers regarding the available features.

摘要

背景

全球约7%的人口患有先天性血红蛋白疾病，每年有超过30万例α地中海贫血新病例。在低收入地区，诊断成本高昂且不准确，通常依赖全血细胞计数（CBC）检测。本研究采用机器学习（ML）根据性别和CBC对α地中海贫血特征进行分类，探讨将静止型和非携带者分组的影响。

方法

数据集包括来自斯里兰卡的288名疑似α地中海贫血患者。使用11个判别公式和9个ML模型进行分类。使用马氏距离去除异常值，并采用合成少数过采样技术（SMOTE）和SMOTE-名义连续（NC）进行重采样。曼-惠特尼U检验处理特征提取和类别分组。用8个标准评估ML性能。

结果

通过将静止型和非携带者分组，埃萨尼公式的受试者工作特征曲线下面积（ROC-AUC）达到0.66。未进行特征提取的卷积神经网络（CNN）表现更好，准确率为0.85，灵敏度为0.8，特异性为0.86，ROC-AUC为0.95/0.93（微/宏）。即使不进行预处理，性能也能保持。

结论

在使用性别和CBC特征对α地中海贫血进行分类时，ML模型优于经典判别公式。更大的数据集可以提高ML模型的泛化能力和特征提取的影响。将静止型和非携带者分组可改善ML结果，尤其是在重采样时。就可用特征而言，静止型携带者与非携带者无法区分。

相似文献

Classification of α-thalassemia data using machine learning models.使用机器学习模型对α地中海贫血数据进行分类。

Comput Methods Programs Biomed. 2025 Mar;260:108581. doi: 10.1016/j.cmpb.2024.108581. Epub 2025 Jan 6.

Prediction of [Formula: see text]-Thalassemia carriers using complete blood count features.应用全血细胞计数特征预测 [公式：见正文]-地中海贫血携带者。

Sci Rep. 2022 Nov 21;12(1):19999. doi: 10.1038/s41598-022-22011-8.

Machine Learning-Based Prediction of Hemoglobinopathies Using Complete Blood Count Data.基于机器学习的全血细胞计数数据血红蛋白病预测。

Clin Chem. 2024 Aug 1;70(8):1064-1075. doi: 10.1093/clinchem/hvae081.

An online alpha-thalassemia carrier discrimination model based on random forest and red blood cell parameters for low HbA cases.基于随机森林和红细胞参数的低 HbA 情况下在线α-地中海贫血携带者判别模型。

Clin Chim Acta. 2022 Jan 15;525:1-5. doi: 10.1016/j.cca.2021.12.003. Epub 2021 Dec 6.

Identifying β-thalassemia carriers using a data mining approach: The case of the Gaza Strip, Palestine.采用数据挖掘方法鉴定β-地中海贫血携带者：以巴勒斯坦加沙地带为例。

Artif Intell Med. 2018 Jun;88:70-83. doi: 10.1016/j.artmed.2018.04.009. Epub 2018 May 3.

A comprehensive case study of deep learning on the detection of alpha thalassemia and beta thalassemia using public and private datasets.一项使用公共和私有数据集对深度学习检测α地中海贫血和β地中海贫血进行的综合案例研究。

Sci Rep. 2025 Apr 17;15(1):13359. doi: 10.1038/s41598-025-97353-0.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

Glioma Tumor Grading Using Radiomics on Conventional MRI: A Comparative Study of WHO 2021 and WHO 2016 Classification of Central Nervous Tumors.基于常规 MRI 的影像组学对脑胶质瘤分级：WHO 2021 与 2016 年中枢神经系统肿瘤分类的对比研究。

J Magn Reson Imaging. 2024 Sep;60(3):923-938. doi: 10.1002/jmri.29146. Epub 2023 Nov 29.

Synthetic minority oversampling of vital statistics data with generative adversarial networks.基于生成对抗网络的生命统计数据合成少数族裔过采样。

J Am Med Inform Assoc. 2020 Nov 1;27(11):1667-1674. doi: 10.1093/jamia/ocaa127.

Critical appraisal of discriminant formulas for distinguishing thalassemia from iron deficiency in patients with microcytic anemia.鉴别小细胞性贫血患者地中海贫血与缺铁性贫血的判别公式的批判性评价。

Clin Chem Lab Med. 2017 Aug 28;55(10):1582-1591. doi: 10.1515/cclm-2016-0856.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用机器学习模型对α地中海贫血数据进行分类。

Classification of α-thalassemia data using machine learning models.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献