College of Public Health, Zhengzhou University, Zhengzhou, China.
Department of Radiation Oncology, Zhengzhou University People's Hospital, Henan Provincial People's Hospital, Zhengzhou, China.
Front Public Health. 2022 Jul 28;10:938801. doi: 10.3389/fpubh.2022.938801. eCollection 2022.
Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features.
We perform machine-learning analysis on 535 different patients, each with 45 features. Data normalization to rescale all real-valued features was performed. Since it is a binary problem, we categorized each patient into one class at a time. We designed three experiments to evaluate the models: (1) feature selection techniques to select appropriate features for the models, (2) experiments on the imbalanced original dataset, and (3) experiments on the SMOTE data. We then compared eight machine learning models to evaluate their effectiveness in predicting pneumonia.
Biomarkers such as C-reactive protein and procalcitonin demonstrated the most significant discriminating power. Ensemble machine learning models such as RF (accuracy = 92.0%, precision = 91.3%, recall = 96.0%, f1-Score = 93.6%) and XGBoost (accuracy = 90.8%, precision = 92.6%, recall = 92.3%, f1-score = 92.4%) achieved the highest performance accuracy on the original dataset with AUCs of 0.96 and 0.97, respectively. On the SMOTE dataset, RF and XGBoost achieved the highest prediction results with f1-scores of 92.0 and 91.2%, respectively. Also, AUC of 0.97 was achieved for both RF and XGBoost models.
Our models showed that in the diagnosis of pneumonia, individual clinical history, laboratory indicators, and symptoms do not have adequate discriminatory power. We can also conclude that the ensemble ML models performed better in this study.
肺炎是一种肺部感染,其发病率和死亡率都很高。使用机器学习系统来检测呼吸道疾病,如非侵入性的物理和实验室参数,正逐渐受到关注,并已被提出用于降低与细菌性肺炎相关的诊断不确定性。在此,本研究使用 8 种机器学习模型,基于生物标志物、实验室参数和物理特征,进行了多项实验,以预测肺炎。
我们对 535 名不同的患者进行了机器学习分析,每位患者有 45 个特征。对所有实值特征进行了数据归一化,以重新缩放。由于这是一个二分类问题,我们每次将每个患者分类到一个类别中。我们设计了三个实验来评估模型:(1)特征选择技术,为模型选择合适的特征;(2)在原始不平衡数据集上的实验;(3)在 SMOTE 数据上的实验。然后,我们比较了 8 种机器学习模型,以评估它们在预测肺炎方面的有效性。
生物标志物如 C 反应蛋白和降钙素表现出最显著的区分能力。RF(准确率=92.0%,精度=91.3%,召回率=96.0%,F1-Score=93.6%)和 XGBoost(准确率=90.8%,精度=92.6%,召回率=92.3%,F1-Score=92.4%)等集成机器学习模型在原始数据集上取得了最高的性能准确性,AUC 分别为 0.96 和 0.97。在 SMOTE 数据集上,RF 和 XGBoost 分别取得了最高的预测结果,F1-Score 分别为 92.0%和 91.2%。此外,RF 和 XGBoost 模型的 AUC 均达到 0.97。
我们的模型表明,在肺炎的诊断中,个体的临床病史、实验室指标和症状没有足够的区分能力。我们还可以得出结论,在这项研究中,集成机器学习模型表现更好。