University Hospital Zurich and University of Zurich, Department of Radiation Oncology, Zurich, Switzerland.
GROW-School for Oncology and Developmental Biology-Maastricht University Medical Centre-, Department of Precision Medicine, The D Lab: Decision Support for Precision Medicine-, Maastricht, The Netherlands.
Sci Rep. 2020 Mar 11;10(1):4542. doi: 10.1038/s41598-020-61297-4.
A major challenge in radiomics is assembling data from multiple centers. Sharing data between hospitals is restricted by legal and ethical regulations. Distributed learning is a technique, enabling training models on multicenter data without data leaving the hospitals ("privacy-preserving" distributed learning). This study tested feasibility of distributed learning of radiomics data for prediction of two year overall survival and HPV status in head and neck cancer (HNC) patients. Pretreatment CT images were collected from 1174 HNC patients in 6 different cohorts. 981 radiomic features were extracted using Z-Rad software implementation. Hierarchical clustering was performed to preselect features. Classification was done using logistic regression. In the validation dataset, the receiver operating characteristics (ROC) were compared between the models trained in the centralized and distributed manner. No difference in ROC was observed with respect to feature selection. The logistic regression coefficients were identical between the methods (absolute difference <10). In comparison of the full workflow (feature selection and classification), no significant difference in ROC was found between centralized and distributed models for both studied endpoints (DeLong p > 0.05). In conclusion, both feature selection and classification are feasible in a distributed manner using radiomics data, which opens new possibility for training more reliable radiomics models.
在放射组学中,一个主要的挑战是从多个中心汇集数据。医院之间的数据共享受到法律和伦理法规的限制。分布式学习是一种技术,它可以在多中心数据上训练模型,而无需将数据带出医院(“隐私保护”分布式学习)。本研究测试了使用放射组学数据进行头颈部癌症(HNC)患者两年总生存和 HPV 状态预测的分布式学习的可行性。从 6 个不同队列的 1174 名 HNC 患者中采集了预处理 CT 图像。使用 Z-Rad 软件实现提取了 981 个放射组学特征。进行层次聚类以预选特征。使用逻辑回归进行分类。在验证数据集上,比较了集中式和分布式训练模型的受试者工作特征(ROC)。特征选择方面,两种方法的 ROC 无差异。两种方法的逻辑回归系数相同(绝对差异<10)。在比较完整的工作流程(特征选择和分类)时,对于两个研究终点,集中式和分布式模型的 ROC 均无显著差异(DeLong p>0.05)。总之,使用放射组学数据可以以分布式方式进行特征选择和分类,为训练更可靠的放射组学模型开辟了新的可能性。