• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HT-Fed-GAN:用于分散式表格数据合成的联邦生成模型

HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis.

作者信息

Duan Shaoming, Liu Chuanyi, Han Peiyi, Jin Xiaopeng, Zhang Xinyi, He Tianyu, Pan Hezhong, Xiang Xiayu

机构信息

School of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China.

Insititute of Data Security, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China.

出版信息

Entropy (Basel). 2022 Dec 31;25(1):88. doi: 10.3390/e25010088.

DOI:10.3390/e25010088
PMID:36673229
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9858387/
Abstract

In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.

摘要

在本文中,我们研究了分布式多方环境下表格数据的隐私保护数据合成(PPDS)问题。在分散式设置中,对于PPDS,现有方法使用具有差分隐私的联邦生成模型。不幸的是,现有模型仅适用于图像或文本数据,不适用于表格数据。与图像不同,表格数据通常由混合数据类型(离散和连续属性)以及数据分布高度不平衡的现实世界数据集组成。由于分散式连续列中的多模态分布以及客户端高度不平衡的分类属性,现有方法几乎无法对这种情况进行建模。为了解决这些问题,我们提出了一种用于分散式表格数据合成的联邦生成模型(HT-Fed-GAN)。HT-Fed-GAN有三个重要部分:联邦变分贝叶斯高斯混合模型(Fed-VB-GMM),旨在解决多模态分布问题;用于全局分类属性表示和重新平衡的带条件采样的联邦条件独热编码;以及用于隐私保护分散式数据建模的基于隐私消耗的联邦条件生成对抗网络。在五个真实世界数据集上的实验结果表明,HT-Fed-GAN在数据效用和隐私级别之间取得了最佳平衡。对于数据效用,HT-Fed-GAN生成的表格在统计上与原始表格最相似,评估分数表明HT-Fed-GAN在机器学习任务方面优于现有模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/d3be6e485e7e/entropy-25-00088-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/d91261b6155d/entropy-25-00088-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/ba9cd9aa1882/entropy-25-00088-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/99ee2b85b128/entropy-25-00088-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/b727d95494e4/entropy-25-00088-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/0e60af9679d0/entropy-25-00088-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/9b353feeb01b/entropy-25-00088-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/2399a8cea63a/entropy-25-00088-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/522ea6e0de46/entropy-25-00088-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/d3be6e485e7e/entropy-25-00088-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/d91261b6155d/entropy-25-00088-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/ba9cd9aa1882/entropy-25-00088-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/99ee2b85b128/entropy-25-00088-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/b727d95494e4/entropy-25-00088-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/0e60af9679d0/entropy-25-00088-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/9b353feeb01b/entropy-25-00088-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/2399a8cea63a/entropy-25-00088-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/522ea6e0de46/entropy-25-00088-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/580d/9858387/d3be6e485e7e/entropy-25-00088-g009.jpg

相似文献

1
HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis.HT-Fed-GAN:用于分散式表格数据合成的联邦生成模型
Entropy (Basel). 2022 Dec 31;25(1):88. doi: 10.3390/e25010088.
2
CTAB-GAN+: enhancing tabular data synthesis.CTAB-GAN+:增强表格数据合成
Front Big Data. 2024 Jan 8;6:1296508. doi: 10.3389/fdata.2023.1296508. eCollection 2023.
3
Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results.使用隐私保护联邦学习和域适应的多站点功能磁共振成像分析:ABIDE研究结果
Med Image Anal. 2020 Oct;65:101765. doi: 10.1016/j.media.2020.101765. Epub 2020 Jul 2.
4
FedDPGAN: Federated Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-19 Pneumonia.FedDPGAN:用于检测新冠肺炎肺炎的联邦差分隐私生成对抗网络框架
Inf Syst Front. 2021;23(6):1403-1415. doi: 10.1007/s10796-021-10144-6. Epub 2021 Jun 15.
5
MolCFL: A personalized and privacy-preserving drug discovery framework based on generative clustered federated learning.MolCFL:基于生成式聚类联邦学习的个性化隐私保护药物发现框架。
J Biomed Inform. 2024 Sep;157:104712. doi: 10.1016/j.jbi.2024.104712. Epub 2024 Aug 23.
6
Decentralized federated learning through proxy model sharing.通过代理模型共享的去中心化联邦学习。
Nat Commun. 2023 May 22;14(1):2899. doi: 10.1038/s41467-023-38569-4.
7
IFL-GAN: Improved Federated Learning Generative Adversarial Network With Maximum Mean Discrepancy Model Aggregation.IFL-GAN:基于最大均值差异模型聚合的改进型联邦学习生成对抗网络
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10502-10515. doi: 10.1109/TNNLS.2022.3167482. Epub 2023 Nov 30.
8
Variation-Aware Federated Learning With Multi-Source Decentralized Medical Image Data.基于多源去中心化医学图像数据的变分感知联邦学习。
IEEE J Biomed Health Inform. 2021 Jul;25(7):2615-2628. doi: 10.1109/JBHI.2020.3040015. Epub 2021 Jul 27.
9
Federated transfer learning for auxiliary classifier generative adversarial networks: framework and industrial application.用于辅助分类器生成对抗网络的联邦迁移学习:框架与工业应用
J Intell Manuf. 2023 May 5:1-16. doi: 10.1007/s10845-023-02126-z.
10
Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy.使用条件生成对抗网络结合差分隐私生成合成个人健康数据。
J Biomed Inform. 2023 Jul;143:104404. doi: 10.1016/j.jbi.2023.104404. Epub 2023 Jun 1.

引用本文的文献

1
Federated learning for generating synthetic data: a scoping review.联邦学习生成合成数据:范围综述。
Int J Popul Data Sci. 2023 Oct 31;8(1):2158. doi: 10.23889/ijpds.v8i1.2158. eCollection 2023.

本文引用的文献

1
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone.机器学习仅通过血清肌酐和射血分数即可预测心力衰竭患者的生存情况。
BMC Med Inform Decis Mak. 2020 Feb 3;20(1):16. doi: 10.1186/s12911-020-1023-5.
2
MedGAN: Medical image translation using GANs.MedGAN:使用 GAN 进行医学图像翻译。
Comput Med Imaging Graph. 2020 Jan;79:101684. doi: 10.1016/j.compmedimag.2019.101684. Epub 2019 Nov 22.