From the Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital.
Department of Computer Science.
Clin Nucl Med. 2022 Jul 1;47(7):606-617. doi: 10.1097/RLU.0000000000004194. Epub 2022 Apr 20.
The generalizability and trustworthiness of deep learning (DL)-based algorithms depend on the size and heterogeneity of training datasets. However, because of patient privacy concerns and ethical and legal issues, sharing medical images between different centers is restricted. Our objective is to build a federated DL-based framework for PET image segmentation utilizing a multicentric dataset and to compare its performance with the centralized DL approach.
PET images from 405 head and neck cancer patients from 9 different centers formed the basis of this study. All tumors were segmented manually. PET images converted to SUV maps were resampled to isotropic voxels (3 × 3 × 3 mm3) and then normalized. PET image subvolumes (12 × 12 × 12 cm3) consisting of whole tumors and background were analyzed. Data from each center were divided into train/validation (80% of patients) and test sets (20% of patients). The modified R2U-Net was used as core DL model. A parallel federated DL model was developed and compared with the centralized approach where the data sets are pooled to one server. Segmentation metrics, including Dice similarity and Jaccard coefficients, percent relative errors (RE%) of SUVpeak, SUVmean, SUVmedian, SUVmax, metabolic tumor volume, and total lesion glycolysis were computed and compared with manual delineations.
The performance of the centralized versus federated DL methods was nearly identical for segmentation metrics: Dice (0.84 ± 0.06 vs 0.84 ± 0.05) and Jaccard (0.73 ± 0.08 vs 0.73 ± 0.07). For quantitative PET parameters, we obtained comparable RE% for SUVmean (6.43% ± 4.72% vs 6.61% ± 5.42%), metabolic tumor volume (12.2% ± 16.2% vs 12.1% ± 15.89%), and total lesion glycolysis (6.93% ± 9.6% vs 7.07% ± 9.85%) and negligible RE% for SUVmax and SUVpeak. No significant differences in performance (P > 0.05) between the 2 frameworks (centralized vs federated) were observed.
The developed federated DL model achieved comparable quantitative performance with respect to the centralized DL model. Federated DL models could provide robust and generalizable segmentation, while addressing patient privacy and legal and ethical issues in clinical data sharing.
基于深度学习(DL)的算法的泛化性和可靠性取决于训练数据集的大小和异质性。然而,由于患者隐私问题以及伦理和法律问题,不同中心之间的医学图像共享受到限制。我们的目标是利用多中心数据集构建基于联邦学习的 PET 图像分割的框架,并将其性能与集中式 DL 方法进行比较。
本研究基于来自 9 个不同中心的 405 例头颈部癌症患者的 PET 图像。所有肿瘤均进行手动分割。将 PET 图像转换为 SUV 图,然后进行体素重采样(3×3×3mm3)并归一化。分析由整个肿瘤和背景组成的 PET 图像子体积(12×12×12cm3)。来自每个中心的数据分为训练/验证集(80%的患者)和测试集(20%的患者)。使用改良的 R2U-Net 作为核心深度学习模型。开发了一个并行的联邦深度学习模型,并与集中式方法进行了比较,在集中式方法中,数据集被汇集到一个服务器上。计算并比较了包括 Dice 相似性和 Jaccard 系数、SUVpeak、SUVmean、SUVmedian、SUVmax、代谢肿瘤体积和总肿瘤糖酵解的相对误差(RE%)在内的分割指标与手动勾画的结果。
集中式与联邦深度学习方法的分割指标性能几乎相同:Dice(0.84±0.06 与 0.84±0.05)和 Jaccard(0.73±0.08 与 0.73±0.07)。对于定量 PET 参数,我们得到了相似的 SUVmean(6.43%±4.72%与 6.61%±5.42%)、代谢肿瘤体积(12.2%±16.2%与 12.1%±15.89%)和总肿瘤糖酵解(6.93%±9.6%与 7.07%±9.85%)的 RE%,以及 SUVmax 和 SUVpeak 的可忽略不计的 RE%。在性能方面(P>0.05),两个框架(集中式与联邦式)之间没有显著差异。
所开发的联邦深度学习模型与集中式深度学习模型相比,在定量性能方面达到了相当的水平。联邦深度学习模型可以在解决临床数据共享中的患者隐私以及法律和伦理问题的同时,提供稳健且可泛化的分割。