利用机器学习技术提高经皮冠状动脉介入治疗后急性肾损伤风险的预测：一项回顾性队列研究。

Enhancing the prediction of acute kidney injury risk after percutaneous coronary intervention using machine learning techniques: A retrospective cohort study.

机构信息

Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut, United States of America.

Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, United States of America.

出版信息

PLoS Med. 2018 Nov 27;15(11):e1002703. doi: 10.1371/journal.pmed.1002703. eCollection 2018 Nov.

DOI:10.1371/journal.pmed.1002703

PMID:30481186

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6258473/

Abstract

BACKGROUND

The current acute kidney injury (AKI) risk prediction model for patients undergoing percutaneous coronary intervention (PCI) from the American College of Cardiology (ACC) National Cardiovascular Data Registry (NCDR) employed regression techniques. This study aimed to evaluate whether models using machine learning techniques could significantly improve AKI risk prediction after PCI.

METHODS AND FINDINGS

We used the same cohort and candidate variables used to develop the current NCDR CathPCI Registry AKI model, including 947,091 patients who underwent PCI procedures between June 1, 2009, and June 30, 2011. The mean age of these patients was 64.8 years, and 32.8% were women, with a total of 69,826 (7.4%) AKI events. We replicated the current AKI model as the baseline model and compared it with a series of new models. Temporal validation was performed using data from 970,869 patients undergoing PCIs between July 1, 2016, and March 31, 2017, with a mean age of 65.7 years; 31.9% were women, and 72,954 (7.5%) had AKI events. Each model was derived by implementing one of two strategies for preprocessing candidate variables (preselecting and transforming candidate variables or using all candidate variables in their original forms), one of three variable-selection methods (stepwise backward selection, lasso regularization, or permutation-based selection), and one of two methods to model the relationship between variables and outcome (logistic regression or gradient descent boosting). The cohort was divided into different training (70%) and test (30%) sets using 100 different random splits, and the performance of the models was evaluated internally in the test sets. The best model, according to the internal evaluation, was derived by using all available candidate variables in their original form, permutation-based variable selection, and gradient descent boosting. Compared with the baseline model that uses 11 variables, the best model used 13 variables and achieved a significantly better area under the receiver operating characteristic curve (AUC) of 0.752 (95% confidence interval [CI] 0.749-0.754) versus 0.711 (95% CI 0.708-0.714), a significantly better Brier score of 0.0617 (95% CI 0.0615-0.0618) versus 0.0636 (95% CI 0.0634-0.0638), and a better calibration slope of observed versus predicted rate of 1.008 (95% CI 0.988-1.028) versus 1.036 (95% CI 1.015-1.056). The best model also had a significantly wider predictive range (25.3% versus 21.6%, p < 0.001) and was more accurate in stratifying AKI risk for patients. Evaluated on a more contemporary CathPCI cohort (July 1, 2015-March 31, 2017), the best model consistently achieved significantly better performance than the baseline model in AUC (0.785 versus 0.753), Brier score (0.0610 versus 0.0627), calibration slope (1.003 versus 1.062), and predictive range (29.4% versus 26.2%). The current study does not address implementation for risk calculation at the point of care, and potential challenges include the availability and accessibility of the predictors.

CONCLUSIONS

Machine learning techniques and data-driven approaches resulted in improved prediction of AKI risk after PCI. The results support the potential of these techniques for improving risk prediction models and identification of patients who may benefit from risk-mitigation strategies.

摘要

背景

美国心脏病学会（ACC）国家心血管数据注册中心（NCDR）目前用于经皮冠状动脉介入治疗（PCI）的急性肾损伤（AKI）风险预测模型采用了回归技术。本研究旨在评估使用机器学习技术的模型是否可以显著提高 PCI 后 AKI 风险预测。

方法和发现

我们使用与开发当前 NCDR CathPCI 注册 AKI 模型相同的队列和候选变量，包括 2009 年 6 月 1 日至 2011 年 6 月 30 日期间接受 PCI 手术的 947091 名患者。这些患者的平均年龄为 64.8 岁，32.8%为女性，共有 69826（7.4%）例 AKI 事件。我们复制了当前的 AKI 模型作为基线模型，并将其与一系列新模型进行了比较。使用 2016 年 7 月 1 日至 2017 年 3 月 31 日期间接受 PCI 的 970869 名患者的数据进行时间验证，这些患者的平均年龄为 65.7 岁，31.9%为女性，72954（7.5%）例 AKI。每个模型都是通过实现两种候选变量预处理策略之一（预选和转换候选变量，或使用候选变量的原始形式）、三种变量选择方法之一（逐步向后选择、套索正则化或基于置换的选择）和两种建模变量与结果之间关系的方法之一（逻辑回归或梯度下降增强）来构建的。使用 100 种不同的随机分割方法，将队列分为不同的训练（70%）和测试（30%）集，在测试集中对模型的性能进行内部评估。根据内部评估，使用所有可用候选变量的原始形式、基于置换的变量选择和梯度下降增强来选择最佳模型。与使用 11 个变量的基线模型相比，最佳模型使用了 13 个变量，获得了更好的受试者工作特征曲线下面积（AUC），为 0.752（95%置信区间 [CI] 0.749-0.754），与 0.711（95% CI 0.708-0.714）相比，更好的 Brier 得分，为 0.0617（95% CI 0.0615-0.0618），与 0.0636（95% CI 0.0634-0.0638）相比，更好的校准斜率，为观察到的与预测的比率为 1.008（95% CI 0.988-1.028），与 1.036（95% CI 1.015-1.056）。最佳模型还具有更宽的预测范围（25.3%对 21.6%，p < 0.001），并能更准确地分层 AKI 风险。在评估一个更具现代性的 CathPCI 队列（2015 年 7 月 1 日至 2017 年 3 月 31 日）时，最佳模型在 AUC（0.785 对 0.753）、Brier 得分（0.0610 对 0.0627）、校准斜率（1.003 对 1.062）和预测范围（29.4%对 26.2%）方面始终表现出显著优于基线模型的性能。本研究没有解决在护理点进行风险计算的实施问题，潜在的挑战包括预测因子的可用性和可及性。