Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
The Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
Antimicrob Resist Infect Control. 2023 Sep 2;12(1):88. doi: 10.1186/s13756-023-01294-0.
Population based surveillance of surgical site infections (SSIs) requires precise case-finding strategies. We sought to develop and validate machine learning models to automate the process of complex (deep incisional/organ space) SSIs case detection.
This retrospective cohort study included adult patients (age ≥ 18 years) admitted to Calgary, Canada acute care hospitals who underwent primary total elective hip (THA) or knee (TKA) arthroplasty between Jan 1st, 2013 and Aug 31st, 2020. True SSI conditions were judged by the Alberta Health Services Infection Prevention and Control (IPC) program staff. Using the IPC cases as labels, we developed and validated nine XGBoost models to identify deep incisional SSIs, organ space SSIs and complex SSIs using administrative data, electronic medical records (EMR) free text data, and both. The performance of machine learning models was assessed by sensitivity, specificity, positive predictive value, negative predictive value, F1 score, the area under the receiver operating characteristic curve (ROC AUC) and the area under the precision-recall curve (PR AUC). In addition, a bootstrap 95% confidence interval (95% CI) was calculated.
There were 22,059 unique patients with 27,360 hospital admissions resulting in 88,351 days of hospital stay. This included 16,561 (60.5%) TKA and 10,799 (39.5%) THA procedures. There were 235 ascertained SSIs. Of them, 77 (32.8%) were superficial incisional SSIs, 57 (24.3%) were deep incisional SSIs, and 101 (42.9%) were organ space SSIs. The incidence rates were 0.37 for superficial incisional SSIs, 0.21 for deep incisional SSIs, 0.37 for organ space and 0.58 for complex SSIs per 100 surgical procedures, respectively. The optimal XGBoost models using administrative data and text data combined achieved a ROC AUC of 0.906 (95% CI 0.835-0.978), PR AUC of 0.637 (95% CI 0.528-0.746), and F1 score of 0.79 (0.67-0.90).
Our findings suggest machine learning models derived from administrative data and EMR text data achieved high performance and can be used to automate the detection of complex SSIs.
基于人群的手术部位感染(SSI)监测需要精确的病例发现策略。我们试图开发和验证机器学习模型,以实现复杂(深部切口/器官间隙)SSI 病例检测的自动化。
这项回顾性队列研究纳入了 2013 年 1 月 1 日至 2020 年 8 月 31 日期间在加拿大卡尔加里急性护理医院接受初次全髋关节置换术(THA)或全膝关节置换术(TKA)的成年患者(年龄≥18 岁)。真正的 SSI 情况由艾伯塔省卫生服务感染预防和控制(IPC)项目工作人员判断。使用 IPC 病例作为标签,我们使用管理数据、电子病历(EMR)自由文本数据以及两者结合,开发和验证了 9 个 XGBoost 模型,以识别深部切口 SSI、器官间隙 SSI 和复杂 SSI。通过灵敏度、特异性、阳性预测值、阴性预测值、F1 评分、接收者操作特征曲线(ROC AUC)下面积和精度-召回曲线(PR AUC)下面积来评估机器学习模型的性能。此外,还计算了 bootstrap 95%置信区间(95%CI)。
共有 22059 名患者,27360 次住院,88351 天的住院时间。其中 16561 例(60.5%)为 TKA,10799 例(39.5%)为 THA。共确定了 235 例 SSI。其中,77 例(32.8%)为浅表切口 SSI,57 例(24.3%)为深部切口 SSI,101 例(42.9%)为器官间隙 SSI。浅表切口 SSI 的发病率为每 100 例手术 0.37,深部切口 SSI 为 0.21,器官间隙 SSI 为 0.37,复杂 SSI 为 0.58。使用管理数据和文本数据结合的最优 XGBoost 模型的 ROC AUC 为 0.906(95%CI 0.835-0.978),PR AUC 为 0.637(95%CI 0.528-0.746),F1 得分为 0.79(0.67-0.90)。
我们的研究结果表明,从管理数据和 EMR 文本数据中得出的机器学习模型具有较高的性能,可以用于自动检测复杂的 SSI。