合成数据生成在维持临床生物标志物方面是否有效？跨多种成像模态研究扩散模型。

Is synthetic data generation effective in maintaining clinical biomarkers? Investigating diffusion models across diverse imaging modalities.

作者信息

Hosseini Abdullah, Serag Ahmed

机构信息

AI Innovation Lab, Weill Cornell Medicine-Qatar, Doha, Qatar.

出版信息

Front Artif Intell. 2025 Jan 31;7:1454441. doi: 10.3389/frai.2024.1454441. eCollection 2024.

DOI:10.3389/frai.2024.1454441

PMID:39959613

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11826350/

Abstract

INTRODUCTION

The integration of recent technologies in medical imaging has become a cornerstone of modern healthcare, facilitating detailed analysis of internal anatomy and pathology. Traditional methods, however, often grapple with data-sharing restrictions due to privacy concerns. Emerging techniques in artificial intelligence offer innovative solutions to overcome these constraints, with synthetic data generation enabling the creation of realistic medical imaging datasets, but the preservation of critical hidden medical biomarkers is an open question.

METHODS

This study employs state-of-the-art Denoising Diffusion Probabilistic Models integrated with a Swin-transformer-based network to generate synthetic medical data. Three distinct areas of medical imaging - radiology, ophthalmology, and histopathology - are explored. The quality of synthetic images is evaluated through a classifier trained to identify the preservation of medical biomarkers.

RESULTS

The diffusion model effectively preserves key medical features, such as lung markings and retinal abnormalities, producing synthetic images closely resembling real data. Classifier performance demonstrates the reliability of synthetic data for downstream tasks, with F1 and AUC reaching 0.8-0.99.

DISCUSSION

This work provides valuable insights into the potential of diffusion-based models for generating realistic, biomarker-preserving synthetic images across various medical imaging modalities. These findings highlight the potential of synthetic data to address challenges such as data scarcity and privacy concerns in clinical practice, research, and education.

摘要

引言

近期技术在医学成像中的整合已成为现代医疗保健的基石，有助于对内部解剖结构和病理学进行详细分析。然而，由于隐私问题，传统方法常常受到数据共享限制的困扰。人工智能领域的新兴技术提供了创新的解决方案来克服这些限制，合成数据生成能够创建逼真的医学成像数据集，但关键隐藏医学生物标志物的保留仍是一个悬而未决的问题。

方法

本研究采用最先进的去噪扩散概率模型，并与基于Swin变压器的网络相结合，以生成合成医学数据。研究探索了医学成像的三个不同领域——放射学、眼科学和组织病理学。通过训练用于识别医学生物标志物保留情况的分类器来评估合成图像的质量。

结果

扩散模型有效地保留了关键医学特征，如肺纹理和视网膜异常，生成的合成图像与真实数据极为相似。分类器性能证明了合成数据在下游任务中的可靠性，F1值和AUC达到0.8 - 0.99。

讨论

这项工作为基于扩散的模型在生成跨各种医学成像模态的逼真、保留生物标志物的合成图像方面的潜力提供了有价值的见解。这些发现凸显了合成数据在应对临床实践、研究和教育中的数据稀缺和隐私问题等挑战方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4513/11826350/fc9e24390f84/frai-07-1454441-g001.jpg

相似文献

Is synthetic data generation effective in maintaining clinical biomarkers? Investigating diffusion models across diverse imaging modalities.合成数据生成在维持临床生物标志物方面是否有效？跨多种成像模态研究扩散模型。

Front Artif Intell. 2025 Jan 31;7:1454441. doi: 10.3389/frai.2024.1454441. eCollection 2024.

Guided synthesis of annotated lung CT images with pathologies using a multi-conditioned denoising diffusion probabilistic model (mDDPM).使用多条件去噪扩散概率模型（mDDPM）对带有病变的标注肺部CT图像进行引导合成。

Phys Med Biol. 2025 Mar 6;70(6). doi: 10.1088/1361-6560/adb9b3.

2D medical image synthesis using transformer-based denoising diffusion probabilistic model.基于变换的去噪扩散概率模型的 2D 医学图像合成。

Phys Med Biol. 2023 May 5;68(10):105004. doi: 10.1088/1361-6560/acca5c.

Synthetic CT generation from MRI using 3D transformer-based denoising diffusion model.基于 3D 变形器的去噪扩散模型从 MRI 生成合成 CT。

Med Phys. 2024 Apr;51(4):2538-2548. doi: 10.1002/mp.16847. Epub 2023 Nov 27.

Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.通过扩散模型可靠地生成隐私保护的合成电子健康记录时间序列。

J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.

Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis.用于人工智能稳健、隐私保护训练的合成医学图像：在早产儿视网膜病变诊断中的应用

Ophthalmol Sci. 2022 Feb 11;2(2):100126. doi: 10.1016/j.xops.2022.100126. eCollection 2022 Jun.

Denoising diffusion probabilistic models for 3D medical image generation.基于去噪扩散概率模型的三维医学图像生成。

Sci Rep. 2023 May 5;13(1):7303. doi: 10.1038/s41598-023-34341-2.

Deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data.基于深度学习的肌肉组织病理学图像分析：使用逼真的合成数据

Commun Med (Lond). 2025 Mar 6;5(1):64. doi: 10.1038/s43856-025-00777-y.

Synthetic Breast Ultrasound Images: A Study to Overcome Medical Data Sharing Barriers.合成乳腺超声图像：一项克服医学数据共享障碍的研究。

Research (Wash D C). 2024 Dec 3;7:0532. doi: 10.34133/research.0532. eCollection 2024.

Counterfactual MRI Generation with Denoising Diffusion Models for Interpretable Alzheimer's Disease Effect Detection.基于去噪扩散模型的反事实MRI生成用于可解释的阿尔茨海默病效应检测

bioRxiv. 2024 Feb 8:2024.02.05.578983. doi: 10.1101/2024.02.05.578983.

本文引用的文献

Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?差分隐私合成数据是否会导致合成发现？

Methods Inf Med. 2024 May;63(1-02):35-51. doi: 10.1055/a-2385-1355. Epub 2024 Aug 13.

Segment anything model for medical image analysis: An experimental study.用于医学图像分析的分割模型：一项实验研究。

Med Image Anal. 2023 Oct;89:102918. doi: 10.1016/j.media.2023.102918. Epub 2023 Aug 2.

2D medical image synthesis using transformer-based denoising diffusion probabilistic model.基于变换的去噪扩散概率模型的 2D 医学图像合成。

Phys Med Biol. 2023 May 5;68(10):105004. doi: 10.1088/1361-6560/acca5c.

MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification.MedMNIST v2 - 用于 2D 和 3D 生物医学图像分类的大规模轻量级基准。

Sci Data. 2023 Jan 19;10(1):41. doi: 10.1038/s41597-022-01721-8.

SinGAN-Seg: Synthetic training data generation for medical image segmentation.SinGAN-Seg：用于医学图像分割的合成训练数据生成。

PLoS One. 2022 May 2;17(5):e0267976. doi: 10.1371/journal.pone.0267976. eCollection 2022.

Pneumothorax: Classification and Etiology.气胸：分类和病因。

Clin Chest Med. 2021 Dec;42(4):711-727. doi: 10.1016/j.ccm.2021.08.007.

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.基于图像的深度学习识别医学诊断和可治疗疾病。

Cell. 2018 Feb 22;172(5):1122-1131.e9. doi: 10.1016/j.cell.2018.02.010.

Myopic Choroidal Neovascularization: Review, Guidance, and Consensus Statement on Management.近视性脉络膜新生血管：综述、管理指导和共识声明。

Ophthalmology. 2017 Nov;124(11):1690-1711. doi: 10.1016/j.ophtha.2017.04.028. Epub 2017 Jun 24.

Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.用于数字病理学图像分析的深度学习：包含选定用例的全面教程。

J Pathol Inform. 2016 Jul 26;7:29. doi: 10.4103/2153-3539.186902. eCollection 2016.

The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts.大数据伦理：生物医学背景下的当前及可预见问题

Sci Eng Ethics. 2016 Apr;22(2):303-41. doi: 10.1007/s11948-015-9652-2. Epub 2015 May 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

合成数据生成在维持临床生物标志物方面是否有效？跨多种成像模态研究扩散模型。

Is synthetic data generation effective in maintaining clinical biomarkers? Investigating diffusion models across diverse imaging modalities.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献