针对手术风险预测中类别不平衡问题的特定风险训练队列

Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.

作者信息

Balch Jeremy A, Ruppert Matthew M, Guan Ziyuan, Buchanan Timothy R, Abbott Kenneth L, Shickel Benjamin, Bihorac Azra, Liang Muxuan, Upchurch Gilbert R, Tignanelli Christopher J, Loftus Tyler J

机构信息

Department of Surgery, University of Florida, Gainesville.

Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville.

出版信息

JAMA Surg. 2024 Dec 1;159(12):1424-1431. doi: 10.1001/jamasurg.2024.4299.

DOI:10.1001/jamasurg.2024.4299

PMID:39382865

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11465118/

Abstract

IMPORTANCE

Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.

OBJECTIVE

To evaluate risk-prediction model performance when trained on risk-specific cohorts.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.

EXPOSURES

The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.

MAIN OUTCOMES AND MEASURES

Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.

RESULTS

A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).

CONCLUSION AND RELEVANCE

In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.

摘要

重要性

机器学习工具越来越多地应用于手术中的风险预测和临床决策支持。类别不平衡会对预测性能产生不利影响，尤其是对于低发生率的并发症。

目的

评估在特定风险队列上训练时风险预测模型的性能。

设计、设置和参与者：这项横断面研究于2024年2月至2024年7月进行，采用了深度学习模型，该模型生成了常见术后并发症的风险评分。对2014年6月1日至2021年5月5日在佛罗里达大学健康系统的2家医院进行的109445例住院手术进行了检查。

暴露因素

该模型根据5种术后并发症发生率经验性定义的高风险、中风险和低风险通用程序术语代码在单独的队列上从头开始训练：（1）住院死亡率；（2）重症监护病房（ICU）延长住院时间（≥48小时）；（3）机械通气延长（≥48小时）；（4）败血症；（5）急性肾损伤（AKI）。并发症的低风险和高风险临界值由数据集中患病率的下三分位数和上三分位数定义，但死亡率除外，其临界值分别设定为1%或更低和大于3%。

主要结局和测量指标

与基线模型一起评估每个特定风险队列的模型性能指标。指标包括受试者工作特征曲线下面积（AUROC）、精确召回率曲线下面积（AUPRC）、F1分数以及每个模型的准确性。

结果

在盖恩斯维尔的佛罗里达大学健康系统的2家医院接受治疗的患者中，共检查了109445例住院手术（77921例手术[71.2%]）和杰克逊维尔的（31524例手术[28.8%]）。患者年龄中位数（IQR）为58（43 - 68）岁，Charlson合并症指数评分中位数（IQR）为2（0 - 4）。在109445例手术中，55646例患者为男性（50.8%），66495例患者（60.8%）接受了非急诊住院手术。在高风险队列上进行训练对AUROC有不同影响，但显著提高了预测死亡率（0.53；95%CI，0.43 - 0.64）、AKI（0.61；95%CI，0.58 - 0.65）和ICU延长住院时间（0.91；95%CI，0.89 - 0.92）的AUPRC（通过不重叠的95%置信区间评估）。它还显著提高了死亡率（0.42；95%CI，0.36 - 0.49）、机械通气延长（0.55；95%CI，0.52 - 0.58）、败血症（0.46；95%CI，0.43 - 0.49）和AKI（0.57；95%CI, 0.54 - 0.59）的F1分数。在控制高风险队列上的基线模型性能后，仅住院死亡率的AUPRC显著增加（0.53；95%CI，0.42 - 0.65对比0.29；95%CI，0.21 - 0.40）。

结论与意义

在这项横断面研究中，通过使用针对特定手术风险类别的先验知识训练单独的模型，观察到标准评估指标的性能有所提高，尤其是对于住院死亡率等低患病率并发症。谨慎使用时，这种方法可能代表手术风险预测模型的最佳训练策略。

相似文献

Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.针对手术风险预测中类别不平衡问题的特定风险训练队列

JAMA Surg. 2024 Dec 1;159(12):1424-1431. doi: 10.1001/jamasurg.2024.4299.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果：来自系统评价和意大利医院数据评估的证据]

Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Exercise rehabilitation following intensive care unit discharge for recovery from critical illness.重症监护病房出院后进行运动康复以促进危重症恢复。

Cochrane Database Syst Rev. 2015 Jun 22;2015(6):CD008632. doi: 10.1002/14651858.CD008632.pub2.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物：网状Meta分析

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

本文引用的文献

Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models.理解随机重采样技术在类别不平衡校正中的应用及其对临床风险预测模型校准和区分的影响。

J Biomed Inform. 2024 Jul;155:104666. doi: 10.1016/j.jbi.2024.104666. Epub 2024 Jun 6.

Dataset meta-level and statistical features affect machine learning performance.数据集的元级别和统计特征会影响机器学习性能。

Sci Rep. 2024 Jan 19;14(1):1670. doi: 10.1038/s41598-024-51825-x.

Evaluation of clinical prediction models (part 1): from development to external validation.临床预测模型的评估（第 1 部分）：从建立到外部验证。

BMJ. 2024 Jan 8;384:e074819. doi: 10.1136/bmj-2023-074819.

Embracing cohort heterogeneity in clinical machine learning development: a step toward generalizable models.在临床机器学习开发中拥抱队列异质性：迈向可推广模型的一步。

Sci Rep. 2023 May 24;13(1):8363. doi: 10.1038/s41598-023-35557-y.

Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models.系统评价确定了基于机器学习的预测模型研究的设计和方法实施情况。

J Clin Epidemiol. 2023 Feb;154:8-22. doi: 10.1016/j.jclinepi.2022.11.015. Epub 2022 Nov 25.

Machine Learning Algorithms for Predicting Surgical Outcomes after Colorectal Surgery: A Systematic Review.机器学习算法在预测结直肠手术后手术结局中的应用：系统综述。

World J Surg. 2022 Dec;46(12):3100-3110. doi: 10.1007/s00268-022-06728-1. Epub 2022 Sep 15.

Predicting Patient-Reported Outcomes Following Surgery Using Machine Learning.运用机器学习预测术后患者报告结局

Am Surg. 2023 Jan;89(1):31-35. doi: 10.1177/00031348221109478. Epub 2022 Jun 18.

Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform.基于电子健康记录数据的机器学习算法预测术后并发症的性能及移动平台报告。

JAMA Netw Open. 2022 May 2;5(5):e2211973. doi: 10.1001/jamanetworkopen.2022.11973.

Machine learning in vascular surgery: a systematic review and critical appraisal.血管外科中的机器学习：系统评价与批判性评估。

NPJ Digit Med. 2022 Jan 19;5(1):7. doi: 10.1038/s41746-021-00552-y.

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.基于监督机器学习技术开发的预测模型研究中的偏倚风险：系统评价。

BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。