• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于差分隐私的合成算法增强体检数据的隐私保护。

Enhancing privacy protection of physical examination data through synthetic algorithms based on differential privacy.

作者信息

Zhang Weili, Liu Ran, Zhu Xinyi, Yu Xiaojin, Jiang Depeng

机构信息

Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, Nanjing, 210009, China.

Department of Occupational and Environmental Health, School of Public Health, Southeast University, Nanjing, 210009, China.

出版信息

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):324. doi: 10.1186/s12911-025-03109-1.

DOI:10.1186/s12911-025-03109-1
PMID:40890720
Abstract

BACKGROUND

Health physical examinations play a crucial role in early detection of cancer and chronic disease. However, privacy concerns limit the utilization of this kind of data for health interventions and research. Synthetic data methods based on differential privacy are increasingly used to create complete datasets that protect privacy while enabling data analysis and result interpretation. Hence, the use of synthetic algorithms based on differential privacy for privacy protection of physical examination data is a promising research direction.

METHODS

Three synthetic algorithms, PrivBayes, PeGS, and DP-Gibbs were used to generate complete synthetic datasets that adhere to differential privacy standards using physical examination data composed of categorical data, which compared with the existing algorithm Private-PGM.

RESULTS

Compared with the existing algorithm, DP-Gibbs can provide privacy preserving capacity of 4.686 (ε = 0.5), while the existing algorithm only with 2.012. In addition, DP-Gibbs provides 0.620 of precision, 0.539 of F1-score, 0.342 of Kappa Coefficient, and 0.765 of AUC-score. The corresponding statistical results of existing algorithm are 0.520, 0.321, 0.188 and 0.695.

CONCLUSIONS

The main contributions of this study are the exploration of combination models incorporating different noise forms and Bayesian synthetic algorithms, alongside a comparative analysis against existing algorithms. This study explored the balance between privacy protection and data utility under different levels of privacy protection, and DP-Gibbs offers more stable technical support for de-identifying physical examination data prior to sharing and analysis, which realized the mining and application of a wider range of medical data under the requirements of privacy protection. By leveraging this effective privacy protection technique, clinical researchers can extract valuable insights on diseases and population health from the physical examination data without the risk of leaking private information.

摘要

背景

健康体检在癌症和慢性病的早期检测中起着至关重要的作用。然而,隐私问题限制了这类数据在健康干预和研究中的利用。基于差分隐私的合成数据方法越来越多地被用于创建完整的数据集,既能保护隐私又能进行数据分析和结果解读。因此,使用基于差分隐私的合成算法对体检数据进行隐私保护是一个有前途的研究方向。

方法

使用三种合成算法PrivBayes、PeGS和DP-Gibbs,利用由分类数据组成的体检数据生成符合差分隐私标准的完整合成数据集,并与现有算法Private-PGM进行比较。

结果

与现有算法相比,DP-Gibbs在隐私保护能力(ε = 0.5时为4.686)方面表现更优,而现有算法仅为2.012。此外,DP-Gibbs的精确率为0.620,F1值为0.539,卡帕系数为0.342,AUC值为0.765。现有算法的相应统计结果分别为0.520、0.321、0.188和0.695。

结论

本研究的主要贡献在于探索了结合不同噪声形式的组合模型和贝叶斯合成算法,并与现有算法进行了对比分析。本研究探索了不同隐私保护水平下隐私保护与数据效用之间的平衡,DP-Gibbs为在共享和分析前对体检数据进行去识别提供了更稳定的技术支持,实现了在隐私保护要求下更广泛医疗数据的挖掘和应用。通过利用这种有效的隐私保护技术,临床研究人员可以从体检数据中提取有关疾病和人群健康的有价值见解,而不会有泄露私人信息的风险。

相似文献

1
Enhancing privacy protection of physical examination data through synthetic algorithms based on differential privacy.通过基于差分隐私的合成算法增强体检数据的隐私保护。
BMC Med Inform Decis Mak. 2025 Sep 1;25(1):324. doi: 10.1186/s12911-025-03109-1.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
4
Federated Analysis With Differential Privacy in Oncology Research: Longitudinal Observational Study Across Hospital Data Warehouses.肿瘤学研究中具有差分隐私的联邦分析:跨医院数据仓库的纵向观察性研究
JMIR Med Inform. 2025 Jul 31;13:e59685. doi: 10.2196/59685.
5
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.
6
DP-SSLoRA: A privacy-preserving medical classification model combining differential privacy with self-supervised low-rank adaptation.DP-SSLoRA:一种结合差分隐私和自监督低秩自适应的隐私保护医学分类模型。
Comput Biol Med. 2024 Sep;179:108792. doi: 10.1016/j.compbiomed.2024.108792. Epub 2024 Jul 3.
7
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
8
Robust privacy amidst innovation with large language models through a critical assessment of the risks.通过对风险的批判性评估,在大语言模型创新中实现强大的隐私保护。
J Am Med Inform Assoc. 2025 May 1;32(5):885-892. doi: 10.1093/jamia/ocaf037.
9
Effective Privacy Protection Strategies for Pregnancy and Gestation Information From Electronic Medical Records: Retrospective Study in a National Health Care Data Network in China.从电子病历中获取妊娠和孕产信息的有效隐私保护策略:中国国家医疗保健数据网络的回顾性研究。
J Med Internet Res. 2024 Aug 20;26:e46455. doi: 10.2196/46455.
10
Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.通过扩散模型可靠地生成隐私保护的合成电子健康记录时间序列。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.

本文引用的文献

1
STIF: Intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation.STIF:具有统计变换证据权重和信息价值的直觉模糊高斯隶属函数,用于保护隐私信息。
Distrib Parallel Databases. 2023 Apr 21:1-34. doi: 10.1007/s10619-023-07423-3.
2
Application of Physical Examination Data on Health Analysis and Intelligent Diagnosis.体检数据在健康分析和智能诊断中的应用。
Biomed Res Int. 2021 Jun 13;2021:8828677. doi: 10.1155/2021/8828677. eCollection 2021.
3
Constructing bi-plots for random forest: Tutorial.
构建随机森林的双图:教程。
Anal Chim Acta. 2020 Sep 22;1131:146-155. doi: 10.1016/j.aca.2020.06.043. Epub 2020 Jul 11.
4
Physical examination tests for screening and diagnosis of cervicogenic headache: A systematic review.用于筛查和诊断颈源性头痛的体格检查测试:一项系统评价。
Man Ther. 2016 Feb;21:35-40. doi: 10.1016/j.math.2015.09.008. Epub 2015 Sep 21.