Xu Xiaobo
Department of Clinical Laboratory, Zhejiang Rong Jun Hospital, Jiaxing, 314000, China.
Diagn Microbiol Infect Dis. 2024 Oct;110(2):116467. doi: 10.1016/j.diagmicrobio.2024.116467. Epub 2024 Jul 30.
In this study, 80 carbapenem-resistant Klebsiella pneumoniae (CR-KP) and 160 carbapenem-susceptible Klebsiella pneumoniae (CS-KP) strains detected in the clinic were selected and their matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) peaks were collected. K-means clustering was performed on the MS peak data to obtain the best "feature peaks", and four different machine learning models were built to compare the area under the ROC curve, specificity, sensitivity, test set score, and ten-fold cross-validation score of the models. By adjusting the model parameters, the test efficacy of the model is increased on the basis of reducing model overfitting. The area under the ROC curve of the Random Forest, Support Vector Machine, Logistic Regression, and Xgboost models used in this study are 0.99, 0.97, 0.96, and 0.97, respectively; the model scores on the test set are 0.94, 0.91, 0.90, and 0.93, respectively; and the results of the ten-fold cross-validation are 0.84, 0.81, 0.81, and 0.85, respectively. Based on the machine learning algorithms and MALDI-TOF MS assay data can realize rapid detection of CR-KP, shorten the in-laboratory reporting time, and provide fast and reliable identification results of CR-KP and CS-KP.
本研究选取临床检测出的80株耐碳青霉烯类肺炎克雷伯菌(CR-KP)和160株对碳青霉烯类敏感的肺炎克雷伯菌(CS-KP)菌株,收集其基质辅助激光解吸/电离飞行时间质谱(MALDI-TOF MS)峰。对质谱峰数据进行K均值聚类以获得最佳“特征峰”,并构建四种不同的机器学习模型,比较各模型的ROC曲线下面积、特异性、敏感性、测试集得分和十折交叉验证得分。通过调整模型参数,在减少模型过拟合的基础上提高模型的测试效能。本研究中使用的随机森林、支持向量机、逻辑回归和Xgboost模型的ROC曲线下面积分别为0.99、0.97、0.96和0.97;在测试集上的模型得分分别为0.94、0.91、0.90和0.93;十折交叉验证结果分别为0.84、0.81、0.81和0.85。基于机器学习算法和MALDI-TOF MS检测数据可实现CR-KP的快速检测,缩短实验室报告时间,并提供CR-KP和CS-KP快速可靠的鉴定结果。