使用CWGAN-GP框架进行数据增强，加强对不平衡非靶向代谢组学数据集的监督分析。

Enhancing supervised analysis of imbalanced untargeted metabolomics datasets using a CWGAN-GP framework for data augmentation.

作者信息

Traquete Francisco, Sousa Silva Marta, Ferreira António E N

机构信息

FT-ICR and Structural Mass Spectrometry Laboratory, Faculdade de Ciências, Universidade de Lisboa, Portugal; Biosystems and Integrative Sciences Institute (BioISI), Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal.

出版信息

Comput Biol Med. 2025 Jan;184:109414. doi: 10.1016/j.compbiomed.2024.109414. Epub 2024 Nov 14.

DOI:10.1016/j.compbiomed.2024.109414

PMID:39546879

Abstract

Untargeted metabolomics is an extremely useful approach for the discrimination of biological systems and biomarker identification. However, data analysis workflows are complex and face many challenges. Two of these challenges are the demand of high sample size and the possibility of severe class imbalance, which is particularly common in clinical studies. The latter can make statistical models less generalizable, increase the risk of overfitting and skew the analysis in favour of the majority class. One possible approach to mitigate this problem is data augmentation. However, the use of artificial data requires adequate data augmentation methods and criteria for assessing the quality of the generated data. In this work, we used Conditional Wasserstein Generative Adversarial Networks with Gradient Penalty (CWGAN-GPs) for data augmentation of metabolomics data. Using a set of benchmark datasets, we applied several criteria for the evaluation of the quality of generated data and assessed the performance of supervised predictive models trained with datasets that included such data. CWGAN-GP models generated realistic data with identical characteristics to real samples, mostly avoiding mode collapse. Furthermore, in cases of class imbalance, the performance of predictive models improved by supplementing the minority class with generated samples. This is evident for high quality datasets with well separated classes. Conversely, model improvements were quite modest for high class overlap datasets. This trend was confirmed by using synthetic datasets with different class separation levels. Data augmentation is a viable procedure to alleviate class imbalance problems but is not universally beneficial in metabolomics.

摘要

非靶向代谢组学是一种用于区分生物系统和识别生物标志物的极其有用的方法。然而，数据分析工作流程复杂且面临许多挑战。其中两个挑战是对高样本量的需求以及严重类不平衡的可能性，这在临床研究中尤为常见。后者会使统计模型的通用性降低，增加过拟合风险，并使分析偏向多数类。缓解此问题的一种可能方法是数据增强。然而，人工数据的使用需要适当的数据增强方法和评估生成数据质量的标准。在这项工作中，我们使用带梯度惩罚的条件瓦瑟斯坦生成对抗网络（CWGAN-GP）对代谢组学数据进行数据增强。使用一组基准数据集，我们应用了几个标准来评估生成数据的质量，并评估了使用包含此类数据的数据集训练的监督预测模型的性能。CWGAN-GP模型生成了具有与真实样本相同特征的逼真数据，大多避免了模式坍塌。此外，在类不平衡的情况下，通过用生成的样本补充少数类，预测模型的性能得到了改善。对于具有明显分离类别的高质量数据集，这一点很明显。相反，对于高类重叠数据集，模型改进相当有限。使用具有不同类分离水平的合成数据集证实了这一趋势。数据增强是缓解类不平衡问题的可行方法，但在代谢组学中并非普遍有益。

相似文献

Enhancing supervised analysis of imbalanced untargeted metabolomics datasets using a CWGAN-GP framework for data augmentation.使用CWGAN-GP框架进行数据增强，加强对不平衡非靶向代谢组学数据集的监督分析。

Comput Biol Med. 2025 Jan;184:109414. doi: 10.1016/j.compbiomed.2024.109414. Epub 2024 Nov 14.

EEG Data Augmentation for Emotion Recognition Using a Conditional Wasserstein GAN.基于条件瓦瑟斯坦生成对抗网络的脑电数据增强用于情绪识别

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:2535-2538. doi: 10.1109/EMBC.2018.8512865.

Improved BCI calibration in multimodal emotion recognition using heterogeneous adversarial transfer learning.使用异构对抗性迁移学习改进多模态情感识别中的脑机接口校准

PeerJ Comput Sci. 2025 Jan 20;11:e2649. doi: 10.7717/peerj-cs.2649. eCollection 2025.

BWGAN-GP: An EEG Data Generation Method for Class Imbalance Problem in RSVP Tasks.BWGAN-GP：一种用于 RSVP 任务中类不平衡问题的 EEG 数据生成方法。

IEEE Trans Neural Syst Rehabil Eng. 2022;30:251-263. doi: 10.1109/TNSRE.2022.3145515. Epub 2022 Feb 2.

Data augmentation for enhancing EEG-based emotion recognition with deep generative models.基于深度生成模型的数据增强以增强基于 EEG 的情绪识别。

J Neural Eng. 2020 Oct 14;17(5):056021. doi: 10.1088/1741-2552/abb580.

2S-BUSGAN: A Novel Generative Adversarial Network for Realistic Breast Ultrasound Image with Corresponding Tumor Contour Based on Small Datasets.2S-BUSGAN：一种基于小数据集的具有真实乳房超声图像和对应肿瘤轮廓的新型生成对抗网络。

Sensors (Basel). 2023 Oct 20;23(20):8614. doi: 10.3390/s23208614.

Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach.考虑数据不平衡的碰撞损伤严重程度预测：带梯度惩罚的 Wasserstein 生成对抗网络方法。

Accid Anal Prev. 2023 Nov;192:107271. doi: 10.1016/j.aap.2023.107271. Epub 2023 Aug 31.

Generative AI with WGAN-GP for boosting seizure detection accuracy.用于提高癫痫发作检测准确性的带有 Wasserstein 生成对抗网络梯度惩罚的生成式人工智能。

Front Artif Intell. 2024 Oct 2;7:1437315. doi: 10.3389/frai.2024.1437315. eCollection 2024.

CEGAN: Classification Enhancement Generative Adversarial Networks for unraveling data imbalance problems.CEGAN：分类增强生成对抗网络，用于解决数据不平衡问题。

Neural Netw. 2021 Jan;133:69-86. doi: 10.1016/j.neunet.2020.10.004. Epub 2020 Oct 17.

Data augmentation-based conditional Wasserstein generative adversarial network-gradient penalty for XSS attack detection system.基于数据增强的条件瓦瑟斯坦生成对抗网络梯度惩罚用于XSS攻击检测系统

PeerJ Comput Sci. 2020 Dec 14;6:e328. doi: 10.7717/peerj-cs.328. eCollection 2020.

引用本文的文献

Research on APT groups malware classification based on TCN-GAN.基于TCN-GAN的高级持续性威胁（APT）组织恶意软件分类研究

PLoS One. 2025 Jun 10;20(6):e0323377. doi: 10.1371/journal.pone.0323377. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用CWGAN-GP框架进行数据增强，加强对不平衡非靶向代谢组学数据集的监督分析。

Enhancing supervised analysis of imbalanced untargeted metabolomics datasets using a CWGAN-GP framework for data augmentation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献