The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou, People's Republic of China.
Department of Scientific & Application, Sysmex Shanghai Ltd, Shanghai, People's Republic of China.
Int J Lab Hematol. 2024 Oct;46(5):918-926. doi: 10.1111/ijlh.14324. Epub 2024 May 31.
The global burden of multiple myeloma (MM) is increasing every year. Here, we have developed machine learning models to provide a reference for the early detection of MM.
A total of 465 patients and 150 healthy controls were enrolled in this retrospective study. Based on the variable screening strategy of least absolute shrinkage and selection operator (LASSO), three prediction models, logistic regression (LR), support vector machine (SVM), and random forest (RF), were established combining complete blood count (CBC) and cell population data (CPD) parameters in the training set (210 cases), and were verified in the validation set (90 cases) and test set (165 cases). The performance of each model was analyzed using receiver operating characteristic (ROC) curve, calibration curves, and decision curve analysis (DCA). Accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the ROC curve (AUC) were applied to evaluate the models. Delong test was used to compare the AUC of the models.
Six parameters including RBC (10/L), RDW-CV (%), IG (%), NE-WZ, LY-WX, and LY-WZ were screened out by LASSO to construct the model. Among the three models, the AUC of RF model in the training set, validation set, and test set were 0.956, 0.892, and 0.875, which were higher than those of LR model (0.901, 0.849, and 0.858) and SVM model (0.929, 0.868, and 0.846). Delong test showed that there were significant differences among the models in the training set, no significant differences in the validation set, and significant differences only between SVM and RF models in the test set. The calibration curve and DCA showed that the three models had good validity and feasibility, and the RF model performed best.
The proposed RF model may be a useful auxiliary tool for rapid screening of MM patients.
多发性骨髓瘤(MM)的全球负担正逐年增加。在这里,我们开发了机器学习模型,为 MM 的早期检测提供参考。
这项回顾性研究共纳入了 465 名患者和 150 名健康对照者。基于最小绝对收缩和选择算子(LASSO)的变量筛选策略,我们在训练集(210 例)中结合全血细胞计数(CBC)和细胞群体数据(CPD)参数建立了逻辑回归(LR)、支持向量机(SVM)和随机森林(RF)三种预测模型,并在验证集(90 例)和测试集(165 例)中进行了验证。我们使用受试者工作特征(ROC)曲线、校准曲线和决策曲线分析(DCA)分析了每个模型的性能。我们使用准确率、敏感度、特异度、阳性预测值、阴性预测值和 ROC 曲线下面积(AUC)来评估模型。我们使用 Delong 检验比较了模型的 AUC。
LASSO 筛选出包括 RBC(10/L)、RDW-CV(%)、IG(%)、NE-WZ、LY-WX 和 LY-WZ 在内的 6 个参数构建模型。在这三种模型中,RF 模型在训练集、验证集和测试集中的 AUC 分别为 0.956、0.892 和 0.875,高于 LR 模型(0.901、0.849 和 0.858)和 SVM 模型(0.929、0.868 和 0.846)。Delong 检验显示,在训练集中,模型之间存在显著差异,在验证集中无显著差异,而在测试集中,仅 SVM 模型和 RF 模型之间存在显著差异。校准曲线和 DCA 表明,三种模型均具有良好的有效性和可行性,其中 RF 模型表现最佳。
所提出的 RF 模型可能是一种快速筛选 MM 患者的有用辅助工具。