• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式人工智能通过合成健康数据减轻表征偏差并提高模型公平性。

Generative AI mitigates representation bias and improves model fairness through synthetic health data.

作者信息

Marchesi Raffaele, Micheletti Nicolo, I-Hsien Kuo Nicholas, Barbieri Sebastiano, Jurman Giuseppe, Osmani Venet

机构信息

Data Science for Health (DSH), Fondazione Bruno Kessler, Trento, Italy.

Department of Mathematics, University of Pavia, Pavia, Italy.

出版信息

PLoS Comput Biol. 2025 May 19;21(5):e1013080. doi: 10.1371/journal.pcbi.1013080. eCollection 2025 May.

DOI:10.1371/journal.pcbi.1013080
PMID:40388536
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12112403/
Abstract

Representation bias in health data can lead to unfair decisions and compromise the generalisability of research findings. As a consequence, underrepresented subpopulations, such as those from specific ethnic backgrounds or genders, do not benefit equally from clinical discoveries. Several approaches have been developed to mitigate representation bias, ranging from simple resampling methods, such as SMOTE, to recent approaches based on generative adversarial networks (GAN). However, generating high-dimensional time-series synthetic health data remains a significant challenge. In response, we devised a novel architecture (CA-GAN) that synthesises authentic, high-dimensional time series data. CA-GAN outperforms state-of-the-art methods in a qualitative and a quantitative evaluation while avoiding mode collapse, a serious GAN failure. We perform evaluation using 7535 patients with hypotension and sepsis from two diverse, real-world clinical datasets. We show that synthetic data generated by our CA-GAN improves model fairness in Black patients as well as female patients when evaluated separately for each subpopulation. Furthermore, CA-GAN generates authentic data of the minority class while faithfully maintaining the original distribution of data, resulting in improved performance in a downstream predictive task.

摘要

健康数据中的代表性偏差可能导致不公平的决策,并损害研究结果的普遍性。因此,代表性不足的亚群体,如来自特定种族背景或性别的群体,无法平等地从临床发现中受益。已经开发了几种方法来减轻代表性偏差,从简单的重采样方法,如SMOTE,到基于生成对抗网络(GAN)的最新方法。然而,生成高维时间序列合成健康数据仍然是一项重大挑战。作为回应,我们设计了一种新颖的架构(CA-GAN),用于合成真实的高维时间序列数据。在定性和定量评估中,CA-GAN优于现有方法,同时避免了模式崩溃,这是GAN的一个严重故障。我们使用来自两个不同的真实世界临床数据集的7535名低血压和脓毒症患者进行评估。我们表明,当对每个亚群体分别进行评估时,我们的CA-GAN生成的合成数据提高了黑人患者和女性患者的模型公平性。此外,CA-GAN生成少数群体类别的真实数据,同时忠实地保持数据的原始分布,从而在下游预测任务中提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/3f4135c6b30c/pcbi.1013080.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/813e1a5dcdeb/pcbi.1013080.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/aaefb4adab0f/pcbi.1013080.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/bcc64242a958/pcbi.1013080.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/9651dd45eccb/pcbi.1013080.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/3f4135c6b30c/pcbi.1013080.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/813e1a5dcdeb/pcbi.1013080.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/aaefb4adab0f/pcbi.1013080.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/bcc64242a958/pcbi.1013080.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/9651dd45eccb/pcbi.1013080.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/12112403/3f4135c6b30c/pcbi.1013080.g005.jpg

相似文献

1
Generative AI mitigates representation bias and improves model fairness through synthetic health data.生成式人工智能通过合成健康数据减轻表征偏差并提高模型公平性。
PLoS Comput Biol. 2025 May 19;21(5):e1013080. doi: 10.1371/journal.pcbi.1013080. eCollection 2025 May.
2
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.生成式人工智能生成高保真囊胚期胚胎图像。
Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064.
3
Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets.使用深度生成对抗网络的合成增强重采样:一种从不平衡数据集中改善癌症预测的新方法。
Cancers (Basel). 2024 Dec 2;16(23):4046. doi: 10.3390/cancers16234046.
4
Improving Multi-Agent Generative Adversarial Nets with Variational Latent Representation.利用变分潜在表示改进多智能体生成对抗网络
Entropy (Basel). 2020 Sep 21;22(9):1055. doi: 10.3390/e22091055.
5
Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing Artificial Intelligence Diagnostic Precision.通过生成对抗网络进行合成泌尿生殖系统图像合成:提高人工智能诊断精度。
J Pers Med. 2024 Jun 30;14(7):703. doi: 10.3390/jpm14070703.
6
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.用于轻量级卷积神经网络的基于生成对抗网络的合成数据训练模型。
Multimed Tools Appl. 2023 May 20:1-23. doi: 10.1007/s11042-023-15747-6.
7
Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV.利用生成对抗网络生成具有类不平衡分布的合成临床数据:以 HIV 的抗逆转录病毒治疗为例。
J Biomed Inform. 2023 Aug;144:104436. doi: 10.1016/j.jbi.2023.104436. Epub 2023 Jul 13.
8
Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study.使用 MOX2-5 传感器在真实和合成数据集上进行的体力活动传感器观测的语义表示和比较分析:概念验证研究。
Sci Rep. 2024 Feb 26;14(1):4634. doi: 10.1038/s41598-024-55183-6.
9
Fusion-driven semi-supervised learning-based lung nodules classification with dual-discriminator and dual-generator generative adversarial network.基于双判别器和双生成器生成对抗网络的融合驱动半监督学习肺结节分类
BMC Med Inform Decis Mak. 2024 Dec 24;24(1):403. doi: 10.1186/s12911-024-02820-9.
10
Synthetic Lung Ultrasound Data Generation Using Autoencoder With Generative Adversarial Network.使用带有生成对抗网络的自动编码器生成合成肺部超声数据
IEEE Trans Ultrason Ferroelectr Freq Control. 2025 May;72(5):624-635. doi: 10.1109/TUFFC.2025.3555447. Epub 2025 May 7.

本文引用的文献

1
An evaluation of synthetic data augmentation for mitigating covariate bias in health data.评估合成数据增强以减轻健康数据中的协变量偏差。
Patterns (N Y). 2024 Feb 29;5(4):100946. doi: 10.1016/j.patter.2024.100946. eCollection 2024 Apr 12.
2
Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project.丰富数据科学和医疗保健教育:通过健康健身房项目应用和影响合成数据集。
JMIR Med Educ. 2024 Jan 16;10:e51388. doi: 10.2196/51388.
3
Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for HIV.
利用生成对抗网络生成具有类不平衡分布的合成临床数据:以 HIV 的抗逆转录病毒治疗为例。
J Biomed Inform. 2023 Aug;144:104436. doi: 10.1016/j.jbi.2023.104436. Epub 2023 Jul 13.
4
Synthetic data could be better than real data.合成数据可能比真实数据更好。
Nature. 2023 Apr 27. doi: 10.1038/d41586-023-01445-8.
5
Hyperlactatemia and altered lactate kinetics are associated with excess mortality in sepsis : A multicenter retrospective observational study.高乳酸血症和乳酸动力学改变与脓毒症患者死亡率升高相关:一项多中心回顾性观察研究。
Wien Klin Wochenschr. 2023 Feb;135(3-4):80-88. doi: 10.1007/s00508-022-02130-y. Epub 2022 Dec 28.
6
The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms.健康健身房:用于开发强化学习算法的综合健康相关数据集。
Sci Data. 2022 Nov 11;9(1):693. doi: 10.1038/s41597-022-01784-7.
7
Addressing fairness in artificial intelligence for medical imaging.解决医学影像人工智能中的公平性问题。
Nat Commun. 2022 Aug 6;13(1):4581. doi: 10.1038/s41467-022-32186-3.
8
Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation.深度受试者工作特征曲线(ROC)分析及作为平衡平均准确率的曲线下面积(AUC),用于改进分类器选择、审核与解释。
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):329-341. doi: 10.1109/TPAMI.2022.3145392. Epub 2022 Dec 5.
9
Prediction of blood lactate values in critically ill patients: a retrospective multi-center cohort study.危重症患者血乳酸值预测:一项回顾性多中心队列研究。
J Clin Monit Comput. 2022 Aug;36(4):1087-1097. doi: 10.1007/s10877-021-00739-4. Epub 2021 Jul 5.
10
Health data poverty: an assailable barrier to equitable digital health care.健康数据贫困:公平数字医疗的可攻破障碍。
Lancet Digit Health. 2021 Apr;3(4):e260-e265. doi: 10.1016/S2589-7500(20)30317-4. Epub 2021 Mar 4.