• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于医疗保健中异构分布的表格变压器生成对抗网络。

Tabular transformer generative adversarial network for heterogeneous distribution in healthcare.

作者信息

Kang Ha Ye Jin, Ko Minsam, Ryu Kwang Sun

机构信息

Department of Applied Artificial Intelligence, Hanyang University, Seoul, Republic of Korea.

Department of Public Health & AI, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Republic of Korea.

出版信息

Sci Rep. 2025 Mar 25;15(1):10254. doi: 10.1038/s41598-025-93077-3.

DOI:10.1038/s41598-025-93077-3
PMID:40133347
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11937286/
Abstract

In healthcare, the most common type of data is tabular data, which holds high significance and potential in the field of medical AI. However, privacy concerns have hindered their widespread use. Despite the emergence of synthetic data as a viable solution, the generation of healthcare tabular data (HTD) is complex owing to the extensive interdependencies between the variables within each record that incorporate diverse clinical characteristics, including sensitive information. To overcome these issues, this study proposed a tabular transformer generative adversarial network (TT-GAN) to generate synthetic data that can effectively consider the relationships between variables potentially present in the HTD dataset. Transformers can consider the relationships between the columns in each record using a multi-attention mechanism. In addition, to address the potential risk of restoring sensitive data in patient information, a Transformer was employed in a generative adversarial network (GAN) architecture, to ensure an implicit-based algorithm. To consider the heterogeneous characteristics of the continuous variables in the HTD dataset, the discretization and converter methodology were applied. The experimental results confirmed the superior performance of the TT-GAN than the Conditional Tabular GAN (CTGAN) and copula GAN. Discretization and converters were proven to be effective using our proposed Transformer algorithm. However, the application of the same methodology to Transformer-based models without discretization and converters exhibited a significantly inferior performance. The CTGAN and copula GAN indicated minimal effectiveness with discretization and converter methodologies. Thus, the TT-GAN exhibited considerable potential in healthcare, demonstrating its ability to generate artificial data that closely resembled real healthcare datasets. The ability of the algorithm to handle different types of mixed variables efficiently, including polynomial, discrete, and continuous variables, demonstrated its versatility and practicality in health care research and data synthesis.

摘要

在医疗保健领域,最常见的数据类型是表格数据,其在医学人工智能领域具有高度的重要性和潜力。然而,隐私问题阻碍了它们的广泛应用。尽管合成数据作为一种可行的解决方案已经出现,但由于每条记录中的变量之间存在广泛的相互依赖关系,这些变量包含了包括敏感信息在内的各种临床特征,因此医疗保健表格数据(HTD)的生成非常复杂。为了克服这些问题,本研究提出了一种表格变压器生成对抗网络(TT-GAN),以生成能够有效考虑HTD数据集中潜在变量之间关系的合成数据。变压器可以使用多注意力机制来考虑每条记录中各列之间的关系。此外,为了解决恢复患者信息中敏感数据的潜在风险,在生成对抗网络(GAN)架构中采用了变压器,以确保基于隐式的算法。为了考虑HTD数据集中连续变量的异质性特征,应用了离散化和转换器方法。实验结果证实了TT-GAN比条件表格GAN(CTGAN)和copula GAN具有更优的性能。使用我们提出的变压器算法,离散化和转换器被证明是有效的。然而,将相同的方法应用于没有离散化和转换器的基于变压器的模型时,性能明显较差。CTGAN和copula GAN在离散化和转换器方法下效果甚微。因此,TT-GAN在医疗保健领域展现出了巨大的潜力,证明了其生成与真实医疗保健数据集非常相似的人工数据的能力。该算法有效处理不同类型混合变量(包括多项式、离散和连续变量)的能力,证明了其在医疗保健研究和数据合成中的通用性和实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/84ad8d5f9e7c/41598_2025_93077_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/3a26954c2526/41598_2025_93077_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/7e4e1d475e52/41598_2025_93077_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/a698bf5c2306/41598_2025_93077_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/c99e139b4696/41598_2025_93077_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/d01491680e37/41598_2025_93077_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/e0cd2667bfa4/41598_2025_93077_Fige_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/ed07d16c5278/41598_2025_93077_Figf_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/84ad8d5f9e7c/41598_2025_93077_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/3a26954c2526/41598_2025_93077_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/7e4e1d475e52/41598_2025_93077_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/a698bf5c2306/41598_2025_93077_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/c99e139b4696/41598_2025_93077_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/d01491680e37/41598_2025_93077_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/e0cd2667bfa4/41598_2025_93077_Fige_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/ed07d16c5278/41598_2025_93077_Figf_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c25/11937286/84ad8d5f9e7c/41598_2025_93077_Fig2_HTML.jpg

相似文献

1
Tabular transformer generative adversarial network for heterogeneous distribution in healthcare.用于医疗保健中异构分布的表格变压器生成对抗网络。
Sci Rep. 2025 Mar 25;15(1):10254. doi: 10.1038/s41598-025-93077-3.
2
Enhanced Conditional GAN for High-Quality Synthetic Tabular Data Generation in Mobile-Based Cardiovascular Healthcare.用于基于移动设备的心血管医疗保健中高质量合成表格数据生成的增强条件生成对抗网络
Sensors (Basel). 2024 Nov 30;24(23):7673. doi: 10.3390/s24237673.
3
Personal health data protection and intelligent healthcare applications under generative adversarial network.生成对抗网络下的个人健康数据保护与智能医疗应用
Sci Rep. 2025 May 13;15(1):16558. doi: 10.1038/s41598-025-01575-1.
4
Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy.基于生成对抗网络的医疗保健合成表格数据:采用分治策略进行生成与验证
JMIR Med Inform. 2023 Nov 24;11:e47859. doi: 10.2196/47859.
5
Large Language Models for Synthetic Tabular Health Data: A Benchmark Study.大型语言模型在合成表格健康数据中的应用:基准研究。
Stud Health Technol Inform. 2024 Aug 22;316:963-967. doi: 10.3233/SHTI240571.
6
Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.基于效用的统计方法和深度学习模型用于合成数据生成的分析,重点关注相关结构:算法开发与验证
JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.
7
Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study.使用 MOX2-5 传感器在真实和合成数据集上进行的体力活动传感器观测的语义表示和比较分析:概念验证研究。
Sci Rep. 2024 Feb 26;14(1):4634. doi: 10.1038/s41598-024-55183-6.
8
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.用于轻量级卷积神经网络的基于生成对抗网络的合成数据训练模型。
Multimed Tools Appl. 2023 May 20:1-23. doi: 10.1007/s11042-023-15747-6.
9
CTAB-GAN+: enhancing tabular data synthesis.CTAB-GAN+:增强表格数据合成
Front Big Data. 2024 Jan 8;6:1296508. doi: 10.3389/fdata.2023.1296508. eCollection 2023.
10
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.生成式人工智能生成高保真囊胚期胚胎图像。
Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064.

引用本文的文献

1
Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm.利用WGAN-GP数据增强和XGBoost算法提高体脂预测能力。
Sci Prog. 2025 Jul-Sep;108(3):368504251366850. doi: 10.1177/00368504251366850. Epub 2025 Aug 6.

本文引用的文献

1
Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings?在家庭医疗环境中,合成数据增强能否提高机器学习分类器在患者-护士言语交流中识别健康问题的性能?
J Nurs Scholarsh. 2025 Jan;57(1):47-58. doi: 10.1111/jnu.13004. Epub 2024 Jul 3.
2
Synthetic Data Improve Survival Status Prediction Models in Early-Onset Colorectal Cancer.合成数据改善早发性结直肠癌生存状态预测模型。
JCO Clin Cancer Inform. 2024 Jan;8:e2300201. doi: 10.1200/CCI.23.00201.
3
Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy.
基于生成对抗网络的医疗保健合成表格数据:采用分治策略进行生成与验证
JMIR Med Inform. 2023 Nov 24;11:e47859. doi: 10.2196/47859.
4
Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.利用合成数据在医疗保健领域的力量:创新、应用与隐私。
NPJ Digit Med. 2023 Oct 9;6(1):186. doi: 10.1038/s41746-023-00927-3.
5
A guide to sharing open healthcare data under the General Data Protection Regulation.《通用数据保护条例》下开放医疗保健数据共享指南。
Sci Data. 2023 Jun 24;10(1):404. doi: 10.1038/s41597-023-02256-2.
6
Deep Neural Networks and Tabular Data: A Survey.深度神经网络与表格数据:一项综述。
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7499-7519. doi: 10.1109/TNNLS.2022.3229161. Epub 2024 Jun 3.
7
Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers.基于外显子组数据集的集成机器学习算法在癌症早期诊断预测中的应用。
BMC Bioinformatics. 2022 Nov 18;23(1):496. doi: 10.1186/s12859-022-05050-w.
8
Privacy preserving Generative Adversarial Networks to model Electronic Health Records.用于建模电子健康记录的隐私保护生成对抗网络。
Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.
9
CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection.CovidGAN:使用辅助分类器生成对抗网络进行数据增强以改进新冠病毒检测
IEEE Access. 2020 May 14;8:91916-91923. doi: 10.1109/ACCESS.2020.2994762. eCollection 2020.
10
Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study.基于临床和影像数据的新冠肺炎自动严重程度评估机器学习方法的开发与验证:回顾性研究
JMIR Med Inform. 2021 Feb 11;9(2):e24572. doi: 10.2196/24572.