Kossen Tabea, Hirzel Manuel A, Madai Vince I, Boenisch Franziska, Hennemuth Anja, Hildebrand Kristian, Pokutta Sebastian, Sharma Kartikey, Hilbert Adam, Sobesky Jan, Galinovic Ivana, Khalil Ahmed A, Fiebach Jochen B, Frey Dietmar
CLAIM-Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany.
Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany.
Front Artif Intell. 2022 May 2;5:813842. doi: 10.3389/frai.2022.813842. eCollection 2022.
Sharing labeled data is crucial to acquire large datasets for various Deep Learning applications. In medical imaging, this is often not feasible due to privacy regulations. Whereas anonymization would be a solution, standard techniques have been shown to be partially reversible. Here, synthetic data using a Generative Adversarial Network (GAN) with differential privacy guarantees could be a solution to ensure the patient's privacy while maintaining the predictive properties of the data. In this study, we implemented a Wasserstein GAN (WGAN) with and without differential privacy guarantees to generate privacy-preserving labeled Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) image patches for brain vessel segmentation. The synthesized image-label pairs were used to train a U-net which was evaluated in terms of the segmentation performance on real patient images from two different datasets. Additionally, the Fréchet Inception Distance (FID) was calculated between the generated images and the real images to assess their similarity. During the evaluation using the U-Net and the FID, we explored the effect of different levels of privacy which was represented by the parameter ϵ. With stricter privacy guarantees, the segmentation performance and the similarity to the real patient images in terms of FID decreased. Our best segmentation model, trained on synthetic and private data, achieved a Dice Similarity Coefficient (DSC) of 0.75 for ϵ = 7.4 compared to 0.84 for ϵ = ∞ in a brain vessel segmentation paradigm (DSC of 0.69 and 0.88 on the second test set, respectively). We identified a threshold of ϵ <5 for which the performance (DSC <0.61) became unstable and not usable. Our synthesized labeled TOF-MRA images with strict privacy guarantees retained predictive properties necessary for segmenting the brain vessels. Although further research is warranted regarding generalizability to other imaging modalities and performance improvement, our results mark an encouraging first step for privacy-preserving data sharing in medical imaging.
共享带标签的数据对于获取用于各种深度学习应用的大型数据集至关重要。在医学成像中,由于隐私法规,这通常不可行。虽然匿名化可能是一种解决方案,但标准技术已被证明具有部分可逆性。在此,使用具有差分隐私保证的生成对抗网络(GAN)生成的合成数据可能是一种解决方案,既能确保患者隐私,又能保持数据的预测特性。在本研究中,我们实现了具有和不具有差分隐私保证的 Wasserstein GAN(WGAN),以生成用于脑血管分割的隐私保护带标签的飞行时间磁共振血管造影(TOF-MRA)图像块。合成的图像-标签对用于训练一个U-net,该U-net根据来自两个不同数据集的真实患者图像的分割性能进行评估。此外,计算生成图像与真实图像之间的弗雷歇因袭距离(FID)以评估它们的相似性。在使用U-Net和FID进行评估期间,我们探索了由参数ϵ表示的不同隐私级别所产生的影响。随着隐私保证更加严格,分割性能以及在FID方面与真实患者图像的相似性会降低。我们在合成数据和私有数据上训练的最佳分割模型,在脑血管分割范例中,对于ϵ = 7.4,Dice相似系数(DSC)为0.75,而对于ϵ = ∞为0.84(在第二个测试集上分别为0.69和0.88)。我们确定了ϵ <5的阈值,对于该阈值,性能(DSC <0.61)变得不稳定且不可用。我们具有严格隐私保证的合成带标签TOF-MRA图像保留了分割脑血管所需的预测特性。尽管对于推广到其他成像模态和性能改进仍需进一步研究,但我们的结果标志着医学成像中隐私保护数据共享迈出了令人鼓舞的第一步。