Department of Infection Control, The Third Affiliated Hospital of Soochow University, Changzhou, 213003, China.
BMC Infect Dis. 2023 May 4;23(1):284. doi: 10.1186/s12879-023-08235-7.
This study aimed to develop and validate a machine learning algorithm-based model for predicting invasive Klebsiella pneumoniae liver abscess syndrome(IKPLAS) in diabetes mellitus and compare the performance of different models.
The clinical signs and data on the admission of 213 diabetic patients with Klebsiella pneumoniae liver abscesses were collected as variables. The optimal feature variables were screened out, and then Artificial Neural Network, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor, Decision Tree, and XGBoost models were established. Finally, the model's prediction performance was evaluated by the ROC curve, sensitivity (recall), specificity, accuracy, precision, F1-score, Average Precision, calibration curve, and DCA curve.
Four features of hemoglobin, platelet, D-dimer, and SOFA score were screened by the recursive elimination method, and seven prediction models were established based on these variables. The AUC (0.969), F1-Score(0.737), Sensitivity(0.875) and AP(0.890) of the SVM model were the highest among the seven models. The KNN model showed the highest specificity (1.000). Except that the XGB and DT models over-estimates the occurrence of IKPLAS risk, the other models' calibration curves are a good fit with the actual observed results. Decision Curve Analysis showed that when the risk threshold was between 0.4 and 0.8, the net rate of intervention of the SVM model was significantly higher than that of other models. In the feature importance ranking, the SOFA score impacted the model significantly.
An effective prediction model of invasion Klebsiella pneumoniae liver abscess syndrome in diabetes mellitus could be established by a machine learning algorithm, which had potential application value.
本研究旨在开发和验证一种基于机器学习算法的模型,用于预测糖尿病合并侵袭性肺炎克雷伯菌肝脓肿综合征(IKPLAS),并比较不同模型的性能。
收集了 213 例糖尿病合并肺炎克雷伯菌肝脓肿患者的临床体征和入院数据作为变量。筛选出最佳特征变量,然后建立人工神经网络、支持向量机、逻辑回归、随机森林、K-最近邻、决策树和 XGBoost 模型。最后,通过 ROC 曲线、灵敏度(召回率)、特异性、准确性、精准度、F1 评分、平均精度、校准曲线和 DCA 曲线评估模型的预测性能。
采用递归消除法筛选出血红蛋白、血小板、D-二聚体和 SOFA 评分 4 个特征,基于这些变量建立了 7 个预测模型。SVM 模型的 AUC(0.969)、F1-Score(0.737)、灵敏度(0.875)和 AP(0.890)最高。KNN 模型的特异性最高(1.000)。除 XGB 和 DT 模型高估了 IKPLAS 风险的发生外,其他模型的校准曲线与实际观察结果拟合良好。决策曲线分析显示,当风险阈值在 0.4 到 0.8 之间时,SVM 模型的净干预率明显高于其他模型。在特征重要性排名中,SOFA 评分对模型影响显著。
可以通过机器学习算法建立一种有效的糖尿病合并侵袭性肺炎克雷伯菌肝脓肿综合征的预测模型,具有潜在的应用价值。