Department of Clinical Laboratory, The Second Hospital of Nanping, Nanping, 354200, Fujian, China.
Department of Clinical Laboratory, Fuding Hospital, Fujian University of Traditional Chinese Medicine, 120 South Road of Old City, Fuding, 355200, Fujian, China.
Sci Rep. 2024 Sep 30;14(1):22673. doi: 10.1038/s41598-024-74057-5.
The COVID-19 pandemic has underscored the critical need for precise diagnostic methods to distinguish between similar respiratory infections, such as COVID-19 and Mycoplasma pneumoniae (MP). Identifying key biomarkers and utilizing machine learning techniques, such as random forest analysis, can significantly improve diagnostic accuracy. We conducted a retrospective analysis of clinical and laboratory data from 214 patients with acute respiratory infections, collected between October 2022 and October 2023 at the Second Hospital of Nanping. The study population was categorized into three groups: COVID-19 positive (n = 52), MP positive (n = 140), and co-infected (n = 22). Key biomarkers, including C-reactive protein (CRP), procalcitonin (PCT), interleukin- 6 (IL-6), and white blood cell (WBC) counts, were evaluated. Correlation analyses were conducted to assess relationships between biomarkers within each group. The random forest analysis was applied to evaluate the discriminative power of these biomarkers. The random forest model demonstrated high classification performance, with area under the ROC curve (AUC) scores of 0.86 (95% CI: 0.70-0.97) for COVID-19, 0.79 (95% CI: 0.64-0.92) for MP, 0.69 (95% CI: 0.50-0.87) for co-infections, and 0.90 (95% CI: 0.83-0.95) for the micro-average ROC. Additionally, the precision-recall curve for the random forest classifier showed a micro-average AUC of 0.80 (95% CI: 0.69-0.91). Confusion matrices highlighted the model's accuracy (0.77) and biomarker relationships. The SHAP feature importance analysis indicated that age (0.27), CRP (0.25), IL6 (0.14), and PCT (0.14) were the most significant predictors. The integration of computational methods, particularly random forest analysis, in evaluating clinical and biomarker data presents a promising approach for enhancing diagnostic processes for infectious diseases. Our findings support the use of specific biomarkers in differentiating between COVID-19 and MP, potentially leading to more targeted and effective diagnostic strategies. This study underscores the potential of machine learning techniques in improving disease classification in the era of precision medicine.
新型冠状病毒肺炎大流行凸显了精确诊断方法的重要性,这些方法可用于区分类似的呼吸道感染,如新型冠状病毒肺炎和肺炎支原体(MP)。确定关键生物标志物并利用机器学习技术(如随机森林分析)可显著提高诊断准确性。我们对 2022 年 10 月至 2023 年 10 月间在南平市第二医院就诊的 214 例急性呼吸道感染患者的临床和实验室数据进行了回顾性分析。研究人群分为三组:新型冠状病毒肺炎阳性(n=52)、肺炎支原体阳性(n=140)和混合感染(n=22)。评估了 C 反应蛋白(CRP)、降钙素原(PCT)、白细胞介素 6(IL-6)和白细胞计数等关键生物标志物。进行了相关分析以评估每组内生物标志物之间的关系。应用随机森林分析评估这些生物标志物的判别能力。随机森林模型显示出较高的分类性能,对新型冠状病毒肺炎的曲线下面积(AUC)评分为 0.86(95%可信区间:0.70-0.97),对肺炎支原体的 AUC 评分为 0.79(95%可信区间:0.64-0.92),对混合感染的 AUC 评分为 0.69(95%可信区间:0.50-0.87),对微平均 AUC 的 AUC 评分为 0.90(95%可信区间:0.83-0.95)。此外,随机森林分类器的精度-召回曲线显示微平均 AUC 为 0.80(95%可信区间:0.69-0.91)。混淆矩阵突出了模型的准确性(0.77)和生物标志物关系。SHAP 特征重要性分析表明年龄(0.27)、CRP(0.25)、IL6(0.14)和 PCT(0.14)是最重要的预测因素。计算方法,特别是随机森林分析,在评估临床和生物标志物数据方面的整合,为提高传染病的诊断过程提供了一种很有前途的方法。我们的研究结果支持使用特定的生物标志物来区分新型冠状病毒肺炎和肺炎支原体,这可能会导致更有针对性和有效的诊断策略。本研究强调了机器学习技术在精准医学时代改善疾病分类的潜力。