Clarke Erik, Chehoud Christel, Khan Najat, Spiessens Bart, Poolman Jan, Geurtsen Jeroen
Janssen Research and Development Data Sciences, Spring House, PA, USA.
Janssen Research and Development, Beerse, Belgium.
BMC Infect Dis. 2024 Aug 8;24(1):796. doi: 10.1186/s12879-024-09669-3.
Invasive Escherichia coli disease (IED), also known as invasive extraintestinal pathogenic E. coli disease, is a leading cause of sepsis and bacteremia in older adults that can result in hospitalization and sometimes death and is frequently associated with antimicrobial resistance. Moreover, certain patient characteristics may increase the risk of developing IED. This study aimed to validate a machine learning approach for the unbiased identification of potential risk factors that correlate with an increased risk for IED.
Using electronic health records from 6.5 million people, an XGBoost model was trained to predict IED from 663 distinct patient features, and the most predictive features were identified as potential risk factors. Using Shapley Additive predictive values, the specific relationships between features and the outcome of developing IED were characterized.
The model independently predicted that older age, a known risk factor for IED, increased the chance of developing IED. The model also predicted that a history of ≥ 1 urinary tract infection, as well as more frequent and/or more recent urinary tract infections, and ≥ 1 emergency department or inpatient visit increased the risk for IED. Outcomes were used to calculate risk ratios in selected subpopulations, demonstrating the impact of individual or combinations of features on the incidence of IED.
This study illustrates the viability and validity of using large electronic health records datasets and machine learning to identify correlating features and potential risk factors for infectious diseases, including IED. The next step is the independent validation of potential risk factors using conventional methods.
侵袭性大肠杆菌病(IED),也称为侵袭性肠外致病性大肠杆菌病,是老年人败血症和菌血症的主要病因,可导致住院,有时甚至死亡,且常与抗菌药物耐药性相关。此外,某些患者特征可能会增加患IED的风险。本研究旨在验证一种机器学习方法,以无偏倚地识别与IED风险增加相关的潜在风险因素。
使用来自650万人的电子健康记录,训练一个XGBoost模型,根据663个不同的患者特征预测IED,并将最具预测性的特征确定为潜在风险因素。使用夏普利加性预测值来描述特征与患IED结果之间的具体关系。
该模型独立预测,年龄较大这一已知的IED风险因素会增加患IED的几率。该模型还预测,≥1次尿路感染病史,以及更频繁和/或更近发生的尿路感染,以及≥1次急诊科就诊或住院会增加患IED的风险。利用这些结果计算选定亚组中的风险比,证明了个体特征或特征组合对IED发病率的影响。
本研究说明了使用大型电子健康记录数据集和机器学习来识别传染病(包括IED)的相关特征和潜在风险因素的可行性和有效性。下一步是使用传统方法对潜在风险因素进行独立验证。