Qin Yuchao, Alaa Ahmed, Floto Andres, Schaar Mihaela van der
University of Cambridge, Cambridge, United Kingdom.
University of California Berkeley, Berkeley, California, United States of America.
PLOS Digit Health. 2023 Jan 12;2(1):e0000179. doi: 10.1371/journal.pdig.0000179. eCollection 2023 Jan.
Precise and timely referral for lung transplantation is critical for the survival of cystic fibrosis patients with terminal illness. While machine learning (ML) models have been shown to achieve significant improvement in prognostic accuracy over current referral guidelines, the external validity of these models and their resulting referral policies has not been fully investigated. Here, we studied the external validity of machine learning-based prognostic models using annual follow-up data from the UK and Canadian Cystic Fibrosis Registries. Using a state-of-the-art automated ML framework, we derived a model for predicting poor clinical outcomes in patients enrolled in the UK registry, and conducted external validation of the derived model using the Canadian Cystic Fibrosis Registry. In particular, we studied the effect of (1) natural variations in patient characteristics across populations and (2) differences in clinical practice on the external validity of ML-based prognostic scores. Overall, decrease in prognostic accuracy on the external validation set (AUCROC: 0.88, 95% CI 0.88-0.88) was observed compared to the internal validation accuracy (AUCROC: 0.91, 95% CI 0.90-0.92). Based on our ML model, analysis on feature contributions and risk strata revealed that, while external validation of ML models exhibited high precision on average, both factors (1) and (2) can undermine the external validity of ML models in patient subgroups with moderate risk for poor outcomes. A significant boost in prognostic power (F1 score) from 0.33 (95% CI 0.31-0.35) to 0.45 (95% CI 0.45-0.45) was observed in external validation when variations in these subgroups were accounted in our model. Our study highlighted the significance of external validation of ML models for cystic fibrosis prognostication. The uncovered insights on key risk factors and patient subgroups can be used to guide the cross-population adaptation of ML-based models and inspire new research on applying transfer learning methods for fine-tuning ML models to cope with regional variations in clinical care.
对于患有晚期疾病的囊性纤维化患者,准确及时地转诊进行肺移植对其生存至关重要。虽然机器学习(ML)模型已被证明在预后准确性方面比当前的转诊指南有显著提高,但这些模型及其产生的转诊政策的外部有效性尚未得到充分研究。在此,我们使用来自英国和加拿大囊性纤维化登记处的年度随访数据,研究了基于机器学习的预后模型的外部有效性。我们使用一个先进的自动化ML框架,推导了一个用于预测英国登记处患者不良临床结局的模型,并使用加拿大囊性纤维化登记处对推导模型进行了外部验证。特别是,我们研究了(1)不同人群中患者特征的自然差异和(2)临床实践差异对基于ML的预后评分外部有效性的影响。总体而言,与内部验证准确性(AUCROC:0.91,95%CI 0.90 - 0.92)相比,在外部验证集上观察到预后准确性有所下降(AUCROC:0.88,95%CI 0.88 - 0.88)。基于我们的ML模型,对特征贡献和风险分层的分析表明,虽然ML模型的外部验证平均表现出高精度,但因素(1)和(2)都可能在结局不良风险中等的患者亚组中削弱ML模型的外部有效性。当在我们的模型中考虑这些亚组的差异时,在外部验证中观察到预后能力(F1评分)从0.33(95%CI 0.31 - 0.35)显著提高到0.45(95%CI 0.45 - 0.45)。我们的研究强调了对ML模型进行外部验证以预测囊性纤维化预后的重要性。所发现的关于关键风险因素和患者亚组的见解可用于指导基于ML的模型在不同人群中的适应性调整,并激发关于应用迁移学习方法微调ML模型以应对临床护理区域差异的新研究。