Laboratory of Radiation Physics, Odense University Hospital, Odense, Denmark; Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Danish Centre for Particle Therapy, Aarhus University Hospital, Denmark; Institute of Medical Physics, School of Physics, University of Sydney, Sydney, Australia.
Radiotherapy department, The Christie NHS Foundation Trust, Manchester, United Kingdom.
Radiother Oncol. 2022 Nov;176:179-186. doi: 10.1016/j.radonc.2022.09.023. Epub 2022 Oct 5.
Federated learning has the potential to perfrom analysis on decentralised data; however, there are some obstacles to survival analyses as there is a risk of data leakage. This study demonstrates how to perform a stratified Cox regression survival analysis specifically designed to avoid data leakage using federated learning on larynx cancer patients from centres in three different countries.
Data were obtained from 1821 larynx cancer patients treated with radiotherapy in three centres. Tumour volume was available for all 786 of the included patients. Parameter selection among eleven clinical and radiotherapy parameters were performed using best subset selection and cross-validation through the federated learning system, AusCAT. After parameter selection, β regression coefficients were estimated using bootstrap. Calibration plots were generated at 2 and 5-years survival, and inner and outer risk groups' Kaplan-Meier curves were compared to the Cox model prediction.
The best performing Cox model included log(GTV), performance status, age, smoking, haemoglobin and N-classification; however, the simplest model with similar statistical prediction power included log(GTV) and performance status only. The Harrell C-indices for the simplest model were for Odense, Christie and Liverpool 0.75[0.71-0.78], 0.65[0.59-0.71], and 0.69[0.59-0.77], respectively. The values are slightly higher for the full model with C-index 0.77[0.74-0.80], 0.67[0.62-0.73] and 0.71[0.61-0.80], respectively. Smoking during treatment has the same hazard as a ten-years older nonsmoking patient.
Without any patient-specific data leaving the hospitals, a stratified Cox regression model based on data from centres in three countries was developed without data leakage risks. The overall survival model is primarily driven by tumour volume and performance status.
联邦学习有可能对分散的数据进行分析;然而,由于存在数据泄露的风险,生存分析存在一些障碍。本研究展示了如何在来自三个不同国家的中心的喉癌患者中使用联邦学习来执行特定设计以避免数据泄露的分层 Cox 回归生存分析。
从三个中心的 1821 例接受放射治疗的喉癌患者中获得数据。纳入的 786 例患者均有肿瘤体积数据。通过联邦学习系统 AusCAT 进行了 11 个临床和放射治疗参数中的参数选择和交叉验证。参数选择后,使用 bootstrap 估计 β回归系数。生成了 2 年和 5 年生存的校准图,并比较了内外风险组的 Kaplan-Meier 曲线与 Cox 模型预测的结果。
表现最佳的 Cox 模型包括 log(GTV)、体能状态、年龄、吸烟、血红蛋白和 N 分类;然而,具有相似统计预测能力的最简单模型仅包括 log(GTV)和体能状态。最简单模型的 Odense、Christie 和 Liverpool 的 Harrell C 指数分别为 0.75[0.71-0.78]、0.65[0.59-0.71]和 0.69[0.59-0.77]。包含所有参数的完整模型的 C 指数分别为 0.77[0.74-0.80]、0.67[0.62-0.73]和 0.71[0.61-0.80],略高一些。治疗期间吸烟的危害与十年前不吸烟的患者相同。
在没有任何患者特定数据离开医院的情况下,基于来自三个国家中心的数据开发了一种分层 Cox 回归模型,没有数据泄露的风险。总体生存模型主要由肿瘤体积和体能状态驱动。