Yang Jenny, Soltan Andrew A S, Clifton David A
Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, UK.
John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
NPJ Digit Med. 2022 Jun 7;5(1):69. doi: 10.1038/s41746-022-00614-9.
As patient health information is highly regulated due to privacy concerns, most machine learning (ML)-based healthcare studies are unable to test on external patient cohorts, resulting in a gap between locally reported model performance and cross-site generalizability. Different approaches have been introduced for developing models across multiple clinical sites, however less attention has been given to adopting ready-made models in new settings. We introduce three methods to do this-(1) applying a ready-made model "as-is" (2); readjusting the decision threshold on the model's output using site-specific data and (3); finetuning the model using site-specific data via transfer learning. Using a case study of COVID-19 diagnosis across four NHS Hospital Trusts, we show that all methods achieve clinically-effective performances (NPV > 0.959), with transfer learning achieving the best results (mean AUROCs between 0.870 and 0.925). Our models demonstrate that site-specific customization improves predictive performance when compared to other ready-made approaches.
由于隐私问题,患者健康信息受到严格监管,大多数基于机器学习(ML)的医疗保健研究无法在外部患者队列上进行测试,导致本地报告的模型性能与跨站点通用性之间存在差距。已经引入了不同的方法来在多个临床站点开发模型,然而,在新环境中采用现成模型的关注度较低。我们介绍了三种方法来做到这一点——(1)直接应用现成模型;(2)使用特定于站点的数据重新调整模型输出的决策阈值;(3)通过迁移学习使用特定于站点的数据对模型进行微调。通过对四个英国国民保健服务(NHS)医院信托机构的新冠肺炎诊断进行案例研究,我们表明所有方法都能实现临床有效的性能(阴性预测值>0.959),迁移学习取得了最佳结果(平均曲线下面积在0.870至0.925之间)。我们的模型表明,与其他现成方法相比,特定于站点的定制提高了预测性能。