Department of Clinical Laboratory, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
Department of Clinical Laboratory, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
Clin Chim Acta. 2022 Jan 15;525:1-5. doi: 10.1016/j.cca.2021.12.003. Epub 2021 Dec 6.
Since screening of α-thalassemia carriers by low HbA has a low positive predictive value (PPV), the PPV was as low as 40.97% in our laboratory, other more effective screening methods need to be devised. This study aimed at developing a machine learning model by using red blood cell parameters to identify α-thalassemia carriers from low HbA patients.
Laboratory data of 1213 patients with low HbA used for modeling was randomly divided into the training set (849 of 1213, 70%) and the internal validation set (364 of 1213, 30%). In addition, an external data set (n = 399) was used for model validation. Fourteen machine learning methods were applied to construct a discriminant model. Performance was evaluated with accuracy, sensitivity, specificity, etc. and compared with 7 previously published discriminant function formulae.
The optimal model was based on random forest with 5 clinical features. The PPV of the model was more than twice the PPV of HbA, and the model had a high negative predictive value (NPV) at the same time. Compared with seven formulae in screening of α-thalassemia carriers, the model had a better accuracy (0.915), specificity (0.967), NPV (0.901), PPV (0.942) and area under the receiver operating characteristic curve (AUC, 0.948) in the independent test set.
Use of a random forest-based model enables rapid discrimination of α-thalassemia carriers from low HbA cases.
由于低 HbA 筛查α-地中海贫血携带者的阳性预测值(PPV)较低,本实验室的 PPV 低至 40.97%,因此需要设计其他更有效的筛查方法。本研究旨在通过使用红细胞参数建立机器学习模型,从低 HbA 患者中识别出α-地中海贫血携带者。
用于建模的 1213 例低 HbA 患者的实验室数据随机分为训练集(1213 例中的 849 例,70%)和内部验证集(1213 例中的 364 例,30%)。此外,还使用了一个外部数据集(n=399)进行模型验证。应用了 14 种机器学习方法来构建判别模型。使用准确性、敏感性、特异性等评估性能,并与 7 种已发表的判别函数公式进行比较。
最佳模型基于随机森林,具有 5 个临床特征。该模型的 PPV 高于 HbA 的 PPV 两倍,同时具有较高的阴性预测值(NPV)。与筛查α-地中海贫血携带者的七种公式相比,该模型在独立测试集中具有更好的准确性(0.915)、特异性(0.967)、NPV(0.901)、PPV(0.942)和接受者操作特征曲线下面积(AUC,0.948)。
使用基于随机森林的模型可以快速区分低 HbA 病例中的α-地中海贫血携带者。