Suppr超能文献

使用机器学习模型对α地中海贫血数据进行分类。

Classification of α-thalassemia data using machine learning models.

作者信息

Christensen Frederik, Kılıç Deniz Kenan, Nielsen Izabela Ewa, El-Galaly Tarec Christoffer, Glenthøj Andreas, Helby Jens, Frederiksen Henrik, Möller Sören, Fuglkjær Alexander Djupnes

机构信息

Operations Research Group, Department of Materials and Production, Aalborg University, Aalborg, 9220, Denmark.

Operations Research Group, Department of Materials and Production, Aalborg University, Aalborg, 9220, Denmark.

出版信息

Comput Methods Programs Biomed. 2025 Mar;260:108581. doi: 10.1016/j.cmpb.2024.108581. Epub 2025 Jan 6.

Abstract

BACKGROUND

Around 7% of the global population has congenital hemoglobin disorders, with over 300,000 new cases of α-thalassemia annually. Diagnosis is costly and inaccurate in low-income regions, often relying on complete blood count (CBC) tests. This study employs machine learning (ML) to classify α-thalassemia traits based on gender and CBC, exploring the effects of grouping silent- and non-carriers.

METHODS

The dataset includes 288 individuals with suspected α-thalassemia from Sri Lanka. It was classified using eleven discriminant formulae and nine ML models. Outliers were removed using Mahalanobis distance, and resampling was conducted with the synthetic minority oversampling technique (SMOTE) and SMOTE-nominal continuous (NC). The Mann-Whitney U test handled feature extraction and class grouping. ML performance was evaluated with eight criteria.

RESULTS

The Ehsani formula achieved an area under the receiver operating characteristic curve (ROC-AUC) of 0.66 by grouping silent- and non-carriers. The convolutional neural network (CNN) without feature extraction demonstrated better performance, with an accuracy of 0.85, sensitivity of 0.8, specificity of 0.86, and ROC-AUC of 0.95/0.93 (micro/macro). Performance was maintained even without preprocessing.

CONCLUSION

ML models outperformed classical discriminant formulae in classifying α-thalassemia using sex and CBC features. A larger dataset could enhance ML model generalization and the impact of feature extraction. Grouping silent- and non-carriers improved ML results, especially with resampling. The silent carriers were not separable from non-carriers regarding the available features.

摘要

背景

全球约7%的人口患有先天性血红蛋白疾病,每年有超过30万例α地中海贫血新病例。在低收入地区,诊断成本高昂且不准确,通常依赖全血细胞计数(CBC)检测。本研究采用机器学习(ML)根据性别和CBC对α地中海贫血特征进行分类,探讨将静止型和非携带者分组的影响。

方法

数据集包括来自斯里兰卡的288名疑似α地中海贫血患者。使用11个判别公式和9个ML模型进行分类。使用马氏距离去除异常值,并采用合成少数过采样技术(SMOTE)和SMOTE-名义连续(NC)进行重采样。曼-惠特尼U检验处理特征提取和类别分组。用8个标准评估ML性能。

结果

通过将静止型和非携带者分组,埃萨尼公式的受试者工作特征曲线下面积(ROC-AUC)达到0.66。未进行特征提取的卷积神经网络(CNN)表现更好,准确率为0.85,灵敏度为0.8,特异性为0.86,ROC-AUC为0.95/0.93(微/宏)。即使不进行预处理,性能也能保持。

结论

在使用性别和CBC特征对α地中海贫血进行分类时,ML模型优于经典判别公式。更大的数据集可以提高ML模型的泛化能力和特征提取的影响。将静止型和非携带者分组可改善ML结果,尤其是在重采样时。就可用特征而言,静止型携带者与非携带者无法区分。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验