Health Services Research Unit, Division of Medicine, Singapore General Hospital, Singapore.
Health Services Research Centre, Singapore Health Services, Singapore.
Ann Surg. 2020 Dec;272(6):1133-1139. doi: 10.1097/SLA.0000000000003297.
To compare the performance of machine learning models against the traditionally derived Combined Assessment of Risk Encountered in Surgery (CARES) model and the American Society of Anaesthesiologists-Physical Status (ASA-PS) in the prediction of 30-day postsurgical mortality and need for intensive care unit (ICU) stay >24 hours.
Prediction of surgical risk preoperatively is important for clinical shared decision-making and planning of health resources such as ICU beds. The current growth of electronic medical records coupled with machine learning presents an opportunity to improve the performance of established risk models.
All patients aged 18 years and above who underwent noncardiac and nonneurological surgery at Singapore General Hospital (SGH) between 1 January 2012 and 31 October 2016 were included. Patient demographics, comorbidities, preoperative laboratory results, and surgery details were obtained from their electronic medical records. Seventy percent of the observations were randomly selected for training, leaving 30% for testing. Baseline models were CARES and ASA-PS. Candidate models were trained using random forest, adaptive boosting, gradient boosting, and support vector machine. Models were evaluated on area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC).
A total of 90,785 patients were included, of whom 539 (0.6%) died within 30 days and 1264 (1.4%) required ICU admission >24 hours postoperatively. Baseline models achieved high AUROCs despite poor sensitivities by predicting all negative in a predominantly negative dataset. Gradient boosting was the best performing model with AUPRCs of 0.23 and 0.38 for mortality and ICU admission outcomes respectively.
Machine learning can be used to improve surgical risk prediction compared to traditional risk calculators. AUPRC should be used to evaluate model predictive performance instead of AUROC when the dataset is imbalanced.
比较机器学习模型与传统的手术风险综合评估(CARES)模型和美国麻醉医师协会身体状况(ASA-PS)在预测 30 天术后死亡率和需要入住重症监护病房(ICU)>24 小时的表现。
术前预测手术风险对于临床共同决策和 ICU 床位等卫生资源的规划非常重要。当前电子病历的增长与机器学习相结合,为提高现有风险模型的性能提供了机会。
纳入 2012 年 1 月 1 日至 2016 年 10 月 31 日期间在新加坡综合医院(SGH)接受非心脏和非神经外科手术的年龄在 18 岁及以上的所有患者。从他们的电子病历中获取患者的人口统计学资料、合并症、术前实验室结果和手术细节。70%的观察结果被随机选择用于训练,留下 30%用于测试。基线模型为 CARES 和 ASA-PS。候选模型使用随机森林、自适应增强、梯度增强和支持向量机进行训练。使用接受者操作特征曲线下的面积(AUROC)和精度-召回曲线下的面积(AUPRC)评估模型。
共纳入 90785 例患者,其中 539 例(0.6%)在 30 天内死亡,1264 例(1.4%)术后需要入住 ICU>24 小时。尽管在主要为阴性的数据集预测所有阴性时,敏感性较差,但基线模型的 AUROC 仍然很高。梯度增强是表现最好的模型,其死亡率和 ICU 入住率的 AUPRC 分别为 0.23 和 0.38。
与传统风险计算器相比,机器学习可用于提高手术风险预测。当数据集不平衡时,应使用 AUPRC 而不是 AUROC 来评估模型预测性能。