Spaimoc Radu, Mateo Jordi, Solsona Francesc, Jover-Sáenz Alfredo, Barcenilla Fernando, Ramírez-Hidalgo María, Serrano Marcos, Mesas Miquel, Florensa Dídac
Department of Computer Science, University of Lleida, C/ Jaume II, 69, Lleida, 25001, Spain.
Nosocomial Infection Unit, Arnau de Vilanova Universitary Hospital of Lleida, Av. Alcalde Rovira Roure, Lleida, 25198, Spain.
BMC Med Inform Decis Mak. 2025 Aug 11;25(1):299. doi: 10.1186/s12911-025-03113-5.
Healthcare-associated infections (HAIs), particularly Vascular Catheter-Associated Infections (VCAIs), are a significant concern, accounting for over 7% of all infections and are often linked to medical devices. Early detection of VCAIs before invasive infection is crucial for improving hospital care and reducing antibiotic use. This study retrospectively developed and evaluated machine learning models to classify VCAIs from patient medical records, excluding fever and antibiotic prescription indicators. The dataset, collected from the group of public hospitals of the Lleida health region in Catalonia (Spain) between 2011 and 2019, consisted of 24,239 episodes with 150 features related to vascular catheter use. After validation, processing and feature engineering, the dataset showed an imbalance, with 94.46% (10,090) non-catheter episodes and 5.53% (591) catheter infection cases. Machine learning classifiers demonstrated significant challenges in classifying imbalanced datasets, particularly in the context of VCAIs. While most models achieved high accuracy and specificity (approximately 97%), they frequently exhibited limited sensitivity, reaching only around 60% in the best-performing cases. Among the evaluated classifiers, the Gradient Boosting (GB) model outperformed others, attaining the highest balanced accuracy (82.5%) and sensitivity (67%), underscoring its potential utility for early VCAI detection. Additionally, the analysis examined the impact of oversampling techniques on model performance. Although these methods enhanced metrics for some classifiers, they did not consistently outperform models trained on the original dataset. Therefore, if the improvement is not significant, it is preferable to use the original dataset. This study highlights that strategic feature engineering with the GB classifier is sufficient to obtain robust VCAI detection before the appearance of a probable sepsis.
医疗保健相关感染(HAIs),尤其是血管导管相关感染(VCAIs),是一个重大问题,占所有感染的7%以上,且常常与医疗设备有关。在侵入性感染发生之前早期检测VCAIs对于改善医院护理和减少抗生素使用至关重要。本研究回顾性地开发并评估了机器学习模型,以从患者病历中对VCAIs进行分类,排除发热和抗生素处方指标。该数据集收集于2011年至2019年期间西班牙加泰罗尼亚莱里达卫生区的公立医院组,由24239例事件组成,具有150个与血管导管使用相关的特征。经过验证、处理和特征工程后,数据集显示存在不平衡,94.46%(10090例)为非导管事件,5.53%(591例)为导管感染病例。机器学习分类器在对不平衡数据集进行分类时面临重大挑战,尤其是在VCAIs的背景下。虽然大多数模型实现了较高的准确率和特异性(约97%),但它们的敏感性往往有限,在表现最佳的情况下仅达到约60%。在评估的分类器中,梯度提升(GB)模型表现优于其他模型,获得了最高的平衡准确率(82.5%)和敏感性(67%),凸显了其在早期VCAI检测中的潜在效用。此外,分析还研究了过采样技术对模型性能的影响。尽管这些方法提高了一些分类器的指标,但它们并不总是优于在原始数据集上训练的模型。因此,如果改善不显著,最好使用原始数据集。本研究强调,使用GB分类器进行战略性特征工程足以在可能发生败血症之前获得可靠的VCAI检测。