University Division of Anaesthesia, Department of Medicine, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 0QQ, Cambridge, United Kingdom.
PLoS One. 2020 Feb 3;15(2):e0226480. doi: 10.1371/journal.pone.0226480. eCollection 2020.
Cardiopulmonary exercise testing (CPET) is widely used within the United Kingdom for preoperative risk stratification. Despite this, CPET's performance in predicting adverse events has not been systematically evaluated within the framework of classifier performance.
After prospective registration on PROSPERO (CRD42018095508) we systematically identified studies where CPET was used to aid in the prognostication of mortality, cardiorespiratory complications, and unplanned intensive care unit (ICU) admission in individuals undergoing non-cardiopulmonary surgery. For all included studies we extracted or calculated measures of predictive performance whilst identifying and critiquing predictive models encompassing CPET derived variables.
We identified 36 studies for qualitative review, from 27 of which measures of classifier performance could be calculated. We found studies to be highly heterogeneous in methodology and quality with high potential for bias and confounding. We found seven studies that presented risk prediction models for outcomes of interest. Of these, only four studies outlined a clear process of model development; assessment of discrimination and calibration were performed in only two and only one study undertook internal validation. No scores were externally validated. Systematically identified and calculated measures of test performance for CPET demonstrated mixed performance. Data was most complete for anaerobic threshold (AT) based predictions: calculated sensitivities ranged from 20-100% when used for predicting risk of mortality with high negative predictive values (96-100%). In contrast, positive predictive value (PPV) was poor (2.9-42.1%). PPV appeared to be generally higher for cardiorespiratory complications, with similar sensitivities. Similar patterns were seen for the association of Peak VO2 (sensitivity 85.7-100%, PPV 2.7-5.9%) and VE/VCO2 (Sensitivity 27.8%-100%, PPV 3.4-7.1%) with mortality.
In general CPET's 'rule-out' capability appears better than its ability to 'rule-in' complications. Poor PPV may reflect the frequency of complications in studied populations. Our calculated estimates of classifier performance suggest the need for a balanced interpretation of the pros and cons of CPET guided pre-operative risk stratification.
心肺运动测试(CPET)在英国被广泛用于术前风险分层。尽管如此,CPET 在预测不良事件方面的性能尚未在分类器性能框架内进行系统评估。
在 PROSPERO(CRD42018095508)上进行前瞻性注册后,我们系统地确定了使用 CPET 来帮助预测非心肺手术患者死亡率、心肺并发症和计划外重症监护病房(ICU)入院的研究。对于所有纳入的研究,我们提取或计算了预测性能的测量值,同时确定和批判性地评估了包含 CPET 衍生变量的预测模型。
我们定性综述了 36 项研究,其中 27 项研究可以计算分类器性能的测量值。我们发现这些研究在方法学和质量上存在高度异质性,存在高偏倚和混杂的可能性。我们发现了 7 项研究提出了感兴趣结局的风险预测模型。其中,只有 4 项研究概述了明确的模型开发过程;只有 2 项研究进行了区分度和校准度评估,只有 1 项研究进行了内部验证。没有分数进行外部验证。系统地确定和计算 CPET 的测试性能测量值表明其性能参差不齐。基于无氧阈(AT)的预测数据最为完整:当用于预测死亡率风险时,计算出的灵敏度范围为 20-100%,且具有高阴性预测值(96-100%)。相比之下,阳性预测值(PPV)较差(2.9-42.1%)。对于心肺并发症,PPV 似乎普遍较高,且具有相似的灵敏度。对于最大摄氧量(VO2)(灵敏度 85.7-100%,PPV 2.7-5.9%)和 VE/VCO2(灵敏度 27.8%-100%,PPV 3.4-7.1%)与死亡率的关联,也出现了类似的模式。
一般来说,CPET 的“排除”能力似乎优于其“纳入”并发症的能力。较差的 PPV 可能反映了研究人群中并发症的频率。我们计算的分类器性能估计表明,需要平衡解释 CPET 指导术前风险分层的利弊。