• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

phylaGAN:通过条件 GAN 和自动编码器进行数据增强,以改善使用微生物组数据进行疾病预测的准确性。

phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data.

机构信息

Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G2C4, Canada.

Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae161.

DOI:10.1093/bioinformatics/btae161
PMID:38569898
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11256914/
Abstract

MOTIVATION

Research is improving our understanding of how the microbiome interacts with the human body and its impact on human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. However, Machine Learning based prediction using microbiome data has challenges such as, small sample size, imbalance between cases and controls and high cost of collecting large number of samples. To address these challenges, we propose a deep learning framework phylaGAN to augment the existing datasets with generated microbiome data using a combination of conditional generative adversarial network (C-GAN) and autoencoder. Conditional generative adversarial networks train two models against each other to compute larger simulated datasets that are representative of the original dataset. Autoencoder maps the original and the generated samples onto a common subspace to make the prediction more accurate.

RESULTS

Extensive evaluation and predictive analysis was conducted on two datasets, T2D study and Cirrhosis study showing an improvement in mean AUC using data augmentation by 11% and 5% respectively. External validation on a cohort classifying between obese and lean subjects, with a smaller sample size provided an improvement in mean AUC close to 32% when augmented through phylaGAN as compared to using the original cohort. Our findings not only indicate that the generative adversarial networks can create samples that mimic the original data across various diversity metrics, but also highlight the potential of enhancing disease prediction through machine learning models trained on synthetic data.

AVAILABILITY AND IMPLEMENTATION

https://github.com/divya031090/phylaGAN.

摘要

动机

研究正在提高我们对微生物组与人体相互作用及其对人类健康影响的理解。现有的机器学习方法在区分健康和患病微生物组状态方面显示出了巨大的潜力。然而,基于机器学习的微生物组数据分析在小样本量、病例与对照组之间的不平衡以及收集大量样本的高成本等方面存在挑战。为了解决这些挑战,我们提出了一个深度学习框架 phylaGAN,使用条件生成对抗网络(C-GAN)和自动编码器组合,用生成的微生物组数据来扩充现有的数据集。条件生成对抗网络训练两个模型相互竞争,以计算更具代表性的原始数据集的更大模拟数据集。自动编码器将原始样本和生成样本映射到一个公共子空间,以提高预测的准确性。

结果

在两个数据集(T2D 研究和肝硬化研究)上进行了广泛的评估和预测分析,数据扩充后平均 AUC 分别提高了 11%和 5%。在一个将肥胖和消瘦受试者分类的队列上进行外部验证,当使用 phylaGAN 进行扩充时,与使用原始队列相比,平均 AUC 提高了近 32%,样本量较小。我们的研究结果不仅表明生成对抗网络可以生成在各种多样性指标上模仿原始数据的样本,还强调了通过在合成数据上训练的机器学习模型增强疾病预测的潜力。

可用性和实现

https://github.com/divya031090/phylaGAN。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/c42950b46a85/btae161f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/31073544b22a/btae161f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/6864c030608c/btae161f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/d64c22095128/btae161f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/99de870e120d/btae161f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/c42950b46a85/btae161f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/31073544b22a/btae161f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/6864c030608c/btae161f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/d64c22095128/btae161f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/99de870e120d/btae161f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbf/11256914/c42950b46a85/btae161f5.jpg

相似文献

1
phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data.phylaGAN:通过条件 GAN 和自动编码器进行数据增强,以改善使用微生物组数据进行疾病预测的准确性。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae161.
2
A Comparative Analysis of the Novel Conditional Deep Convolutional Neural Network Model, Using Conditional Deep Convolutional Generative Adversarial Network-Generated Synthetic and Augmented Brain Tumor Datasets for Image Classification.新型条件深度卷积神经网络模型的比较分析,该模型使用条件深度卷积生成对抗网络生成的合成及增强脑肿瘤数据集进行图像分类。
Brain Sci. 2024 May 30;14(6):559. doi: 10.3390/brainsci14060559.
3
Utilization of Synthetic Near-Infrared Spectra via Generative Adversarial Network to Improve Wood Stiffness Prediction.利用生成对抗网络的合成近红外光谱提高木材硬度预测
Sensors (Basel). 2024 Mar 21;24(6):1992. doi: 10.3390/s24061992.
4
DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation.DeepMicroGen:一种基于生成对抗网络的纵向微生物组数据插补方法。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad286.
5
Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit.利用条件生成对抗网络生成合成数据来改进混合整数时间建模:以重症监护病房中液体超负荷预测为例的研究。
Comput Biol Med. 2024 Jan;168:107749. doi: 10.1016/j.compbiomed.2023.107749. Epub 2023 Nov 22.
6
Data augmentation for enhancing EEG-based emotion recognition with deep generative models.基于深度生成模型的数据增强以增强基于 EEG 的情绪识别。
J Neural Eng. 2020 Oct 14;17(5):056021. doi: 10.1088/1741-2552/abb580.
7
Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN.通过生成对抗网络(GAN)进行样本增强来提高致病分期的预测准确性。
PLoS One. 2021 Apr 27;16(4):e0250458. doi: 10.1371/journal.pone.0250458. eCollection 2021.
8
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.用于轻量级卷积神经网络的基于生成对抗网络的合成数据训练模型。
Multimed Tools Appl. 2023 May 20:1-23. doi: 10.1007/s11042-023-15747-6.
9
Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation.通过机器学习和数据增强提高急性胰腺炎的死亡率预测。
Comput Biol Med. 2022 Nov;150:106077. doi: 10.1016/j.compbiomed.2022.106077. Epub 2022 Sep 11.
10
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

引用本文的文献

1
Exploring the potential of cell-free RNA and Pyramid Scene Parsing Network for early preeclampsia screening.探索游离RNA和金字塔场景解析网络在早发型子痫前期筛查中的潜力。
BMC Pregnancy Childbirth. 2025 Apr 14;25(1):445. doi: 10.1186/s12884-025-07503-5.
2
PhyloMix: enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.PhyloMix:通过系统发育混合增强来提升微生物组-性状关联预测
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf014.
3
Exploring the frontier of microbiome biomarker discovery with artificial intelligence.

本文引用的文献

1
DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation.DeepMicroGen:一种基于生成对抗网络的纵向微生物组数据插补方法。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad286.
2
DeepGeni: deep generalized interpretable autoencoder elucidates gut microbiota for better cancer immunotherapy.深度泛化可解释自编码器解析肠道微生物群,以实现更好的癌症免疫治疗。
Sci Rep. 2023 Mar 21;13(1):4599. doi: 10.1038/s41598-023-31210-w.
3
phyLoSTM: a novel deep learning model on disease prediction from longitudinal microbiome data.
利用人工智能探索微生物组生物标志物发现的前沿领域。
Natl Sci Rev. 2024 Sep 13;11(11):nwae325. doi: 10.1093/nsr/nwae325. eCollection 2024 Nov.
4
In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder.利用条件变分自编码器通过大规模平行报告基因检测在计算机上生成和增强调控变异体。
bioRxiv. 2024 Jun 29:2024.06.25.600715. doi: 10.1101/2024.06.25.600715.
phyLoSTM:一种基于纵向微生物组数据进行疾病预测的新型深度学习模型。
Bioinformatics. 2021 Nov 5;37(21):3707-3714. doi: 10.1093/bioinformatics/btab482.
4
MB-GAN: Microbiome Simulation via Generative Adversarial Network.MB-GAN:基于生成对抗网络的微生物组模拟。
Gigascience. 2021 Feb 5;10(2). doi: 10.1093/gigascience/giab005.
5
Harnessing machine learning for development of microbiome therapeutics.利用机器学习开发微生物组治疗方法。
Gut Microbes. 2021 Jan-Dec;13(1):1-20. doi: 10.1080/19490976.2021.1872323.
6
Microbiome definition re-visited: old concepts and new challenges.微生物组定义再探讨:旧概念和新挑战。
Microbiome. 2020 Jun 30;8(1):103. doi: 10.1186/s40168-020-00875-0.
7
TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction.TaxoNN:基于分层微生物组数据的神经网络集成用于疾病预测。
Bioinformatics. 2020 Nov 1;36(17):4544-4550. doi: 10.1093/bioinformatics/btaa542.
8
DeepMicro: deep representation learning for disease prediction based on microbiome data.深微:基于微生物组数据的疾病预测的深度学习表示。
Sci Rep. 2020 Apr 7;10(1):6026. doi: 10.1038/s41598-020-63159-5.
9
The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification.深度卷积神经网络集成方法在图像分类中的相对性能
J Appl Stat. 2018;45(15):2800-2818. doi: 10.1080/02664763.2018.1441383. Epub 2018 Feb 26.
10
MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction.MetaPheno:基于宏基因组的疾病预测中深度学习和机器学习的批判性评估。
Methods. 2019 Aug 15;166:74-82. doi: 10.1016/j.ymeth.2019.03.003. Epub 2019 Mar 16.