Department of Biomedical Engineering, Linköping University, Linköping, Sweden.
Center for Medical Image Science and Visualization (CMIV), Linköping University, Linköping, Sweden.
Sci Data. 2024 Feb 29;11(1):259. doi: 10.1038/s41597-024-03073-x.
Large annotated datasets are required for training deep learning models, but in medical imaging data sharing is often complicated due to ethics, anonymization and data protection legislation. Generative AI models, such as generative adversarial networks (GANs) and diffusion models, can today produce very realistic synthetic images, and can potentially facilitate data sharing. However, in order to share synthetic medical images it must first be demonstrated that they can be used for training different networks with acceptable performance. Here, we therefore comprehensively evaluate four GANs (progressive GAN, StyleGAN 1-3) and a diffusion model for the task of brain tumor segmentation (using two segmentation networks, U-Net and a Swin transformer). Our results show that segmentation networks trained on synthetic images reach Dice scores that are 80%-90% of Dice scores when training with real images, but that memorization of the training images can be a problem for diffusion models if the original dataset is too small. Our conclusion is that sharing synthetic medical images is a viable option to sharing real images, but that further work is required. The trained generative models and the generated synthetic images are shared on AIDA data hub.
大型标注数据集对于训练深度学习模型至关重要,但在医学成像领域,由于伦理、匿名化和数据保护法规等因素,数据共享通常较为复杂。生成式人工智能模型,如生成对抗网络(GAN)和扩散模型,如今可以生成非常逼真的合成图像,并有可能促进数据共享。然而,要想共享合成医学图像,首先必须证明它们可以用于训练具有可接受性能的不同网络。在这里,我们全面评估了四种 GAN(渐进式 GAN、StyleGAN 1-3)和一种扩散模型在脑肿瘤分割任务中的应用(使用 U-Net 和 Swin 转换器两种分割网络)。我们的结果表明,在使用真实图像进行训练时,经过合成图像训练的分割网络可以达到 80%-90%的 Dice 分数,但如果原始数据集太小,扩散模型可能会存在对训练图像记忆的问题。我们的结论是,共享合成医学图像是一种可行的替代方案,但需要进一步的工作。经过训练的生成模型和生成的合成图像已在 AIDA 数据中心共享。