Liang Zhaohui, Xue Zhiyun, Rajaraman Sivaramakrishnan, Antani Sameer
Computational Health Research Branch, National Library of Medicine, NIH.
Proc IEEE Southwest Symp Image Anal Interpret. 2024 Mar;2024:21-24. doi: 10.1109/ssiai59505.2024.10508671. Epub 2024 Apr 29.
In this study, we fine-tuned a stable diffusion model to synthesize high resolution chest X-ray images (512×512) with bilateral lung edema caused by COVID-19 pneumonia using the class-specific prior preservation strategy. 300 positive images were selected from the MIDRC dataset as subject instances with an additional 400 negative images for class prior preservation. We synthesized images respectively using the new technique and the conventional technique for comparison. The synthetic images by the stable diffusion fine-tuned by the prior preservation technique have the Frechet inception distance (FID) of 9.2158 and kernel inception distance (KID) 0.0818 computed with the real positive images, which is superior to the synthetic images using the conventional methods such as WGAN and DDIM. The classification accuracy is 0.9975 with precision of 1.0 and recall of 0.9950 when the synthetic positive images with the real negative images were classified by a trained vision transformer (ViT). We conclude that the stable diffusion model can synthesize high-quality and high-resolution chest x-ray images using the prior preservation strategy with a small number of real images as subject instances and text prompt as guidance for the designated patterns.
在本研究中,我们使用特定类别先验保留策略对稳定扩散模型进行微调,以合成由新冠病毒肺炎引起的双侧肺水肿的高分辨率胸部X光图像(512×512)。从MIDRC数据集中选择了300张阳性图像作为主题实例,并额外选择了400张阴性图像用于类别先验保留。我们分别使用新技术和传统技术合成图像以进行比较。通过先验保留技术微调的稳定扩散合成的图像与真实阳性图像计算的弗雷歇因距离(FID)为9.2158,核因距离(KID)为0.0818,优于使用WGAN和DDIM等传统方法合成的图像。当由训练好的视觉Transformer(ViT)对合成阳性图像与真实阴性图像进行分类时,分类准确率为0.9975,精确率为1.0,召回率为0.9950。我们得出结论,稳定扩散模型可以使用先验保留策略,以少量真实图像作为主题实例,并以文本提示作为指定模式的指导,来合成高质量和高分辨率的胸部X光图像。