Division of Haematology/Oncology, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, M5G1X8, Canada.
Biomedical Informatics Research, Stanford University, Palo Alto, USA.
BMC Cancer. 2020 Nov 13;20(1):1103. doi: 10.1186/s12885-020-07618-2.
Objectives were to build a machine learning algorithm to identify bloodstream infection (BSI) among pediatric patients with cancer and hematopoietic stem cell transplantation (HSCT) recipients, and to compare this approach with presence of neutropenia to identify BSI.
We included patients 0-18 years of age at cancer diagnosis or HSCT between January 2009 and November 2018. Eligible blood cultures were those with no previous blood culture (regardless of result) within 7 days. The primary outcome was BSI. Four machine learning algorithms were used: elastic net, support vector machine and two implementations of gradient boosting machine (GBM and XGBoost). Model training and evaluation were performed using temporally disjoint training (60%), validation (20%) and test (20%) sets. The best model was compared to neutropenia alone in the test set.
Of 11,183 eligible blood cultures, 624 (5.6%) were positive. The best model in the validation set was GBM, which achieved an area-under-the-receiver-operator-curve (AUROC) of 0.74 in the test set. Among the 2236 in the test set, the number of false positives and specificity of GBM vs. neutropenia were 508 vs. 592 and 0.76 vs. 0.72 respectively. Among 139 test set BSIs, six (4.3%) non-neutropenic patients were identified by GBM. All received antibiotics prior to culture result availability.
We developed a machine learning algorithm to classify BSI. GBM achieved an AUROC of 0.74 and identified 4.3% additional true cases in the test set. The machine learning algorithm did not perform substantially better than using presence of neutropenia alone to predict BSI.
本研究旨在构建一种机器学习算法,以识别患有癌症和造血干细胞移植(HSCT)的儿科患者中的血流感染(BSI),并将其与中性粒细胞减少症的存在进行比较,以识别 BSI。
我们纳入了 2009 年 1 月至 2018 年 11 月期间诊断为癌症或接受 HSCT 的 0-18 岁患者。符合条件的血培养是指在 7 天内无先前血培养(无论结果如何)。主要结局是 BSI。使用了 4 种机器学习算法:弹性网、支持向量机和 2 种梯度提升机(GBM 和 XGBoost)的实现。使用时间上不重叠的训练(60%)、验证(20%)和测试(20%)集进行模型训练和评估。在测试集中,将最佳模型与中性粒细胞减少症进行比较。
在 11183 份符合条件的血培养中,有 624 份(5.6%)为阳性。在验证集中表现最好的模型是 GBM,在测试集中的 AUC 为 0.74。在 2236 份测试集中,GBM 与中性粒细胞减少症相比,假阳性的数量和特异性分别为 508 比 592 和 0.76 比 0.72。在 139 份测试集 BSI 中,有 6 名(4.3%)非中性粒细胞减少症患者被 GBM 识别。所有患者在培养结果可用前均接受了抗生素治疗。
我们开发了一种用于分类 BSI 的机器学习算法。GBM 在测试集中达到了 0.74 的 AUC,并识别出 4.3%的额外真实病例。该机器学习算法的性能并未明显优于单独使用中性粒细胞减少症来预测 BSI。