Qi Yuxiang, Liu Xu, Ding Zhishan, Yu Ying, Zhuang Zhenchao
School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China.
Department of Laboratory Medicine, The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, China.
BMC Med Inform Decis Mak. 2024 Dec 18;24(1):379. doi: 10.1186/s12911-024-02781-z.
Aplastic anemia (AA) and myelodysplastic neoplasms (MDS) have similar peripheral blood manifestations and are clinically characterized by reduced hematological triad. It is challenging to distinguish and diagnose these two diseases. Hence, utilizing machine learning methods, we employed and validated an algorithm that used cell population data (CPD) parameters to diagnose AA and MDS.
In this study, CPD parameters were obtained from the Beckman Coulter DxH800 analyzer for 160 individuals diagnosed with AA or MDS through a comprehensive retrospective analysis. The individuals were unselectively assigned to a training cohort (77%) and a testing cohort (23%). Additionally, an external validation cohort consisting of eighty-six elderly patients with AA and MDS from two additional centers was established. The discriminative parameters were carefully analyzed through univariate analysis, and the most predictive variables were selected using least absolute shrinkage and selection operator (LASSO) regression. Six machine learning algorithms were utilized to compare the performance of forecasting AA and MDS patients. The area under the curves (AUCs), calibration curves, decision curves analysis (DCA), and shapley additive explanations (SHAP) plots were employed to interpret and assess the model's predictive accuracy, clinical utility, and stability.
After the comparative evaluation of various models, the logistic regression model emerged as the most suitable machine learning model for predicting the probability of AA and MDS, which utilized five principal variables (age, MNVLY, SDVLY, MNLALSEGC, and MNCEGC) to accurately estimate the risk of these diseases. The best model delivered an AUC of 0.791 in the testing cohort and had a high specificity (0.850) and positive predictive value (0.818). Furthermore, the calibration curve indicated excellent agreement between actual and predicted probabilities. The DCA curve further supported the clinical utility of our model and offered significant clinical advantages in guiding treatment decisions. Moreover, the model's performance was consistent in an external validation group, with an AUC of 0.719.
We developed a novel model that effectively distinguished elderly patients with AA and MDS, which had the potential to provide physicians assistance in early diagnosis and the proper treatment for the elderly.
再生障碍性贫血(AA)和骨髓增生异常综合征(MDS)具有相似的外周血表现,临床特征为全血细胞减少。区分和诊断这两种疾病具有挑战性。因此,我们利用机器学习方法,采用并验证了一种使用细胞群体数据(CPD)参数来诊断AA和MDS的算法。
在本研究中,通过全面的回顾性分析,从贝克曼库尔特DxH800分析仪获取了160例诊断为AA或MDS患者的CPD参数。这些个体被随机分配到训练队列(77%)和测试队列(23%)。此外,还建立了一个由来自另外两个中心的86例老年AA和MDS患者组成的外部验证队列。通过单因素分析仔细分析判别参数,并使用最小绝对收缩和选择算子(LASSO)回归选择最具预测性的变量。使用六种机器学习算法比较预测AA和MDS患者的性能。采用曲线下面积(AUC)、校准曲线、决策曲线分析(DCA)和夏普利值加性解释(SHAP)图来解释和评估模型的预测准确性、临床实用性和稳定性。
在对各种模型进行比较评估后,逻辑回归模型成为预测AA和MDS概率最合适的机器学习模型,该模型利用五个主要变量(年龄、MNVLY、SDVLY、MNLALSEGC和MNCEGC)准确估计这些疾病的风险。最佳模型在测试队列中的AUC为0.791,具有高特异性(0.850)和阳性预测值(0.818)。此外,校准曲线表明实际概率与预测概率之间具有良好的一致性。DCA曲线进一步支持了我们模型的临床实用性,并在指导治疗决策方面提供了显著的临床优势。此外,该模型在外部验证组中的性能一致,AUC为0.719。
我们开发了一种新型模型,能够有效区分老年AA和MDS患者,有可能为医生在老年患者的早期诊断和恰当治疗方面提供帮助。