Lai Jin-Xin, Tang Jia-Wei, Gong Shan-Shan, Qin Ming-Xiong, Zhang Yu-Lu, Liang Quan-Fa, Li Li-Yan, Cai Zhen, Wang Liang
Laboratory Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong Province, China.
The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Crawley, WA, Australia.
NPJ Digit Med. 2025 Jun 10;8(1):346. doi: 10.1038/s41746-025-01766-0.
Thalassemia is an inherited blood disorder. Current diagnostic methods mainly rely on sophisticated equipment and specifically trained technicians. This study aims to identify and genotype thalassemia by applying machine learning (ML) algorithms to routine blood parameters. This study recruited a derivation cohort of 31,311 individuals from four independent hospitals and developed eight machine learning (ML) models for the purpose. The performance of these models was compared using a set of evaluation metrics. An additional cohort of 2000 patients was recruited for external validation to assess the generalization of the models. The results demonstrated that the categorical boosting (CatBoost) model exhibited the best discriminative ability in both the training and external validation cohorts. The model was then integrated into an online platform, which holds the potential to act as an auxiliary tool for identifying and genotyping thalassemia via automatic analysis of routine blood test parameters.
地中海贫血是一种遗传性血液疾病。目前的诊断方法主要依赖于精密设备和经过专门培训的技术人员。本研究旨在通过将机器学习(ML)算法应用于常规血液参数来识别地中海贫血并进行基因分型。本研究从四家独立医院招募了一个包含31311名个体的推导队列,并为此开发了八个机器学习(ML)模型。使用一组评估指标对这些模型的性能进行了比较。另外招募了2000名患者组成一个队列进行外部验证,以评估模型的泛化能力。结果表明,分类增强(CatBoost)模型在训练队列和外部验证队列中均表现出最佳的判别能力。然后将该模型集成到一个在线平台中,该平台有潜力通过自动分析常规血液检测参数,作为识别地中海贫血和进行基因分型的辅助工具。