论合成患者数据的保真度与隐私及效用之间的权衡

On the fidelity versus privacy and utility trade-off of synthetic patient data.

作者信息

Adams Tim, Birkenbihl Colin, Otte Karen, Ng Hwei Geok, Rieling Jonas Adrian, Näher Anatol-Fiete, Sax Ulrich, Prasser Fabian, Fröhlich Holger

机构信息

Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany.

Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA.

出版信息

iScience. 2025 Apr 14;28(5):112382. doi: 10.1016/j.isci.2025.112382. eCollection 2025 May 16.

DOI:10.1016/j.isci.2025.112382

PMID:40343279

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12059695/

Abstract

The use of synthetic data is a widely discussed and promising solution for privacy-preserving medical research. Synthetic data may, however, not always rule out the risk of re-identifying characteristics of real patients and can vary greatly in terms of data fidelity and utility. We systematically evaluate the trade-offs between privacy, fidelity, and utility across five synthetic data models and three patient-level datasets. We evaluate fidelity based on statistical similarity to the real data, utility on three machine learning use cases, and privacy via membership inference, singling out, and attribute inference risks. Synthetic data without differential privacy (DP) maintained fidelity and utility without evident privacy breaches, whereas DP-enforced models significantly disrupted correlation structures. K-anonymity-based data sanitization of demographic features, while preserving fidelity, introduced notable privacy risks. Our findings emphasize the need to advance methods that effectively balance privacy, fidelity, and utility in synthetic patient data generation.

摘要

合成数据的使用是隐私-pres隐私保护医学研究中一个广泛讨论且颇具前景的解决方案。然而，合成数据可能并不总能排除重新识别真实患者特征的风险，并且在数据保真度和实用性方面可能有很大差异。我们系统地评估了五个合成数据模型和三个患者级数据集在隐私、保真度和实用性之间的权衡。我们基于与真实数据的统计相似性评估保真度，在三个机器学习用例上评估实用性，并通过成员推理、挑出和属性推理风险来评估隐私。没有差分隐私（DP）的合成数据保持了保真度和实用性，且没有明显的隐私泄露，而强制实施DP的模型显著破坏了相关结构。基于k匿名的人口统计学特征数据清理在保持保真度的同时，引入了显著的隐私风险。我们的研究结果强调了推进在合成患者数据生成中有效平衡隐私、保真度和实用性的方法的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/760f/12059695/9fa34e732042/fx1.jpg

相似文献

On the fidelity versus privacy and utility trade-off of synthetic patient data.

iScience. 2025 Apr 14;28(5):112382. doi: 10.1016/j.isci.2025.112382. eCollection 2025 May 16.

Comprehensive evaluation framework for synthetic tabular data in health: fidelity, utility and privacy analysis of generative models with and without privacy guarantees.

Front Digit Health. 2025 Apr 24;7:1576290. doi: 10.3389/fdgth.2025.1576290. eCollection 2025.

Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.

JMIRx Med. 2025 Mar 12;6:e70100. doi: 10.2196/70100.

Privacy-Preserving Synthetic Data Generation Method for IoT-Sensor Network IDS Using CTGAN.

Sensors (Basel). 2024 Nov 20;24(22):7389. doi: 10.3390/s24227389.

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy.

J Biomed Inform. 2023 Jul;143:104404. doi: 10.1016/j.jbi.2023.104404. Epub 2023 Jun 1.

Privacy Risk Assessment for Synthetic Longitudinal Health Data.

Stud Health Technol Inform. 2024 Aug 30;317:270-279. doi: 10.3233/SHTI240867.

Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation.

Comput Methods Programs Biomed. 2025 Mar;260:108571. doi: 10.1016/j.cmpb.2024.108571. Epub 2024 Dec 28.

Assessing privacy leakage in synthetic 3-D PET imaging using transversal GAN.

Comput Methods Programs Biomed. 2024 Jan;243:107910. doi: 10.1016/j.cmpb.2023.107910. Epub 2023 Nov 3.

Task-Specific Adaptive Differential Privacy Method for Structured Data.

Sensors (Basel). 2023 Feb 10;23(4):1980. doi: 10.3390/s23041980.

CTAB-GAN+: enhancing tabular data synthesis.

Front Big Data. 2024 Jan 8;6:1296508. doi: 10.3389/fdata.2023.1296508. eCollection 2023.

本文引用的文献

A scoping review of privacy and utility metrics in medical synthetic data.

NPJ Digit Med. 2025 Jan 27;8(1):60. doi: 10.1038/s41746-024-01359-3.

NFDI4Health Workflow and Service for Synthetic Data Generation, Assessment and Risk Management.

Stud Health Technol Inform. 2024 Aug 30;317:21-29. doi: 10.3233/SHTI240834.

Are You the Outlier? Identifying Targets for Privacy Attacks on Health Datasets.

Stud Health Technol Inform. 2024 Aug 22;316:1224-1225. doi: 10.3233/SHTI240631.

Synthetic data generation for a longitudinal cohort study - evaluation, method extension and reproduction of published data analysis results.

Sci Rep. 2024 Jun 22;14(1):14412. doi: 10.1038/s41598-024-62102-2.

Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.

PLoS One. 2024 Feb 5;19(2):e0297271. doi: 10.1371/journal.pone.0297271. eCollection 2024.

Improvement of an External Predictive Model Based on New Information Using a Synthetic Data Approach: Application to CADASIL.

Neurol Genet. 2023 Aug 3;9(5):e200091. doi: 10.1212/NXG.0000000000200091. eCollection 2023 Oct.

Synthetic data as an enabler for machine learning applications in medicine.

iScience. 2022 Oct 13;25(11):105331. doi: 10.1016/j.isci.2022.105331. eCollection 2022 Nov 18.

Membership inference attacks against synthetic health data.

J Biomed Inform. 2022 Jan;125:103977. doi: 10.1016/j.jbi.2021.103977. Epub 2021 Dec 14.

Using the Alzheimer's Disease Neuroimaging Initiative to improve early detection, diagnosis, and treatment of Alzheimer's disease.

Alzheimers Dement. 2022 Apr;18(4):824-857. doi: 10.1002/alz.12422. Epub 2021 Sep 28.

Synthetic data in machine learning for medicine and healthcare.

Nat Biomed Eng. 2021 Jun;5(6):493-497. doi: 10.1038/s41551-021-00751-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

论合成患者数据的保真度与隐私及效用之间的权衡

On the fidelity versus privacy and utility trade-off of synthetic patient data.

作者信息

Adams Tim, Birkenbihl Colin, Otte Karen, Ng Hwei Geok, Rieling Jonas Adrian, Näher Anatol-Fiete, Sax Ulrich, Prasser Fabian, Fröhlich Holger

机构信息

Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany.

Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA.

出版信息

iScience. 2025 Apr 14;28(5):112382. doi: 10.1016/j.isci.2025.112382. eCollection 2025 May 16.

DOI:10.1016/j.isci.2025.112382

PMID:40343279

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12059695/

Abstract

摘要

论合成患者数据的保真度与隐私及效用之间的权衡

On the fidelity versus privacy and utility trade-off of synthetic patient data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

论合成患者数据的保真度与隐私及效用之间的权衡

On the fidelity versus privacy and utility trade-off of synthetic patient data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献