Hosseini Abdullah, Serag Ahmed
AI Innovation Lab, Weill Cornell Medicine-Qatar, Doha, Qatar.
Front Artif Intell. 2025 Jan 31;7:1454441. doi: 10.3389/frai.2024.1454441. eCollection 2024.
The integration of recent technologies in medical imaging has become a cornerstone of modern healthcare, facilitating detailed analysis of internal anatomy and pathology. Traditional methods, however, often grapple with data-sharing restrictions due to privacy concerns. Emerging techniques in artificial intelligence offer innovative solutions to overcome these constraints, with synthetic data generation enabling the creation of realistic medical imaging datasets, but the preservation of critical hidden medical biomarkers is an open question.
This study employs state-of-the-art Denoising Diffusion Probabilistic Models integrated with a Swin-transformer-based network to generate synthetic medical data. Three distinct areas of medical imaging - radiology, ophthalmology, and histopathology - are explored. The quality of synthetic images is evaluated through a classifier trained to identify the preservation of medical biomarkers.
The diffusion model effectively preserves key medical features, such as lung markings and retinal abnormalities, producing synthetic images closely resembling real data. Classifier performance demonstrates the reliability of synthetic data for downstream tasks, with F1 and AUC reaching 0.8-0.99.
This work provides valuable insights into the potential of diffusion-based models for generating realistic, biomarker-preserving synthetic images across various medical imaging modalities. These findings highlight the potential of synthetic data to address challenges such as data scarcity and privacy concerns in clinical practice, research, and education.
近期技术在医学成像中的整合已成为现代医疗保健的基石,有助于对内部解剖结构和病理学进行详细分析。然而,由于隐私问题,传统方法常常受到数据共享限制的困扰。人工智能领域的新兴技术提供了创新的解决方案来克服这些限制,合成数据生成能够创建逼真的医学成像数据集,但关键隐藏医学生物标志物的保留仍是一个悬而未决的问题。
本研究采用最先进的去噪扩散概率模型,并与基于Swin变压器的网络相结合,以生成合成医学数据。研究探索了医学成像的三个不同领域——放射学、眼科学和组织病理学。通过训练用于识别医学生物标志物保留情况的分类器来评估合成图像的质量。
扩散模型有效地保留了关键医学特征,如肺纹理和视网膜异常,生成的合成图像与真实数据极为相似。分类器性能证明了合成数据在下游任务中的可靠性,F1值和AUC达到0.8 - 0.99。
这项工作为基于扩散的模型在生成跨各种医学成像模态的逼真、保留生物标志物的合成图像方面的潜力提供了有价值的见解。这些发现凸显了合成数据在应对临床实践、研究和教育中的数据稀缺和隐私问题等挑战方面的潜力。