• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PhyloMix:通过系统发育混合增强来提升微生物组-性状关联预测

PhyloMix: enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.

作者信息

Jiang Yifan, Liao Disen, Zhu Qiyun, Lu Yang Young

机构信息

Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada.

School of Life Sciences, Arizona State University, Tempe, AZ, 85281, United States.

出版信息

Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf014.

DOI:10.1093/bioinformatics/btaf014
PMID:39799515
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11849959/
Abstract

MOTIVATION

Understanding the associations between traits and microbial composition is a fundamental objective in microbiome research. Recently, researchers have turned to machine learning (ML) models to achieve this goal with promising results. However, the effectiveness of advanced ML models is often limited by the unique characteristics of microbiome data, which are typically high-dimensional, compositional, and imbalanced. These characteristics can hinder the models' ability to fully explore the relationships among taxa in predictive analyses. To address this challenge, data augmentation has become crucial. It involves generating synthetic samples with artificial labels based on existing data and incorporating these samples into the training set to improve ML model performance.

RESULTS

Here, we propose PhyloMix, a novel data augmentation method specifically designed for microbiome data to enhance predictive analyses. PhyloMix leverages the phylogenetic relationships among microbiome taxa as an informative prior to guide the generation of synthetic microbial samples. Leveraging phylogeny, PhyloMix creates new samples by removing a subtree from one sample and combining it with the corresponding subtree from another sample. Notably, PhyloMix is designed to address the compositional nature of microbiome data, effectively handling both raw counts and relative abundances. This approach introduces sufficient diversity into the augmented samples, leading to improved predictive performance. We empirically evaluated PhyloMix on six real microbiome datasets across five commonly used ML models. PhyloMix significantly outperforms distinct baseline methods including sample-mixing-based data augmentation techniques like vanilla mixup and compositional cutmix, as well as the phylogeny-based method TADA. We also demonstrated the wide applicability of PhyloMix in both supervised learning and contrastive representation learning.

AVAILABILITY AND IMPLEMENTATION

The Apache-licensed source code is available at (https://github.com/batmen-lab/phylomix).

摘要

动机

了解性状与微生物组成之间的关联是微生物组研究的一个基本目标。最近,研究人员已转向机器学习(ML)模型来实现这一目标,并取得了令人鼓舞的成果。然而,先进的ML模型的有效性往往受到微生物组数据独特特征的限制,这些数据通常是高维的、成分性的和不平衡的。这些特征可能会阻碍模型在预测分析中充分探索分类群之间关系的能力。为应对这一挑战,数据增强变得至关重要。它涉及基于现有数据生成带有人工标签的合成样本,并将这些样本纳入训练集以提高ML模型性能。

结果

在此,我们提出了PhyloMix,一种专门为微生物组数据设计的新型数据增强方法,以增强预测分析。PhyloMix利用微生物组分类群之间的系统发育关系作为信息先验,以指导合成微生物样本的生成。借助系统发育,PhyloMix通过从一个样本中移除一个子树并将其与另一个样本的相应子树组合来创建新样本。值得注意的是,PhyloMix旨在解决微生物组数据的成分性质,有效处理原始计数和相对丰度。这种方法在增强样本中引入了足够的多样性,从而提高了预测性能。我们在五个常用的ML模型上对六个真实的微生物组数据集进行了实证评估。PhyloMix明显优于不同的基线方法,包括基于样本混合的数据增强技术,如普通混合和成分CutMix,以及基于系统发育的方法TADA。我们还展示了PhyloMix在监督学习和对比表示学习中的广泛适用性。

可用性和实现

遵循Apache许可的源代码可在(https://github.com/batmen-lab/phylomix)获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/24c129e25826/btaf014f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/f43101095321/btaf014f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/59daea3ff3c5/btaf014f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/bed53688b37c/btaf014f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/24c129e25826/btaf014f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/f43101095321/btaf014f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/59daea3ff3c5/btaf014f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/bed53688b37c/btaf014f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/833f/11849959/24c129e25826/btaf014f4.jpg

相似文献

1
PhyloMix: enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.PhyloMix:通过系统发育混合增强来提升微生物组-性状关联预测
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf014.
2
TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification.TADA:微生物组样本的系统发育增强可提高表型分类。
Bioinformatics. 2019 Jul 15;35(14):i31-i40. doi: 10.1093/bioinformatics/btz394.
3
PolypMixNet: Enhancing semi-supervised polyp segmentation with polyp-aware augmentation.PolypMixNet:利用息肉感知增强进行半监督息肉分割。
Comput Biol Med. 2024 Mar;170:108006. doi: 10.1016/j.compbiomed.2024.108006. Epub 2024 Jan 15.
4
A novel deep learning method for predictive modeling of microbiome data.一种用于微生物组数据预测建模的新型深度学习方法。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa073.
5
Transformation and differential abundance analysis of microbiome data incorporating phylogeny.整合系统发育信息的微生物组数据的转化和差异丰度分析。
Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.
6
Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.使用微生物组数据进行疾病预测的贝叶斯成分广义线性混合模型
BMC Bioinformatics. 2025 Apr 5;26(1):98. doi: 10.1186/s12859-025-06114-3.
7
MolFCL: predicting molecular properties through chemistry-guided contrastive and prompt learning.MolFCL:通过化学引导的对比学习和提示学习预测分子性质
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf061.
8
Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data.主要微生物群:微生物组数据的组成替代分类群。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac328.
9
Dynamic interaction network inference from longitudinal microbiome data.从纵向微生物组数据推断动态相互作用网络。
Microbiome. 2019 Apr 2;7(1):54. doi: 10.1186/s40168-019-0660-3.
10
DeepMicro: deep representation learning for disease prediction based on microbiome data.深微:基于微生物组数据的疾病预测的深度学习表示。
Sci Rep. 2020 Apr 7;10(1):6026. doi: 10.1038/s41598-020-63159-5.

本文引用的文献

1
MIDASim: a fast and simple simulator for realistic microbiome data.MIDASim:一个快速而简单的用于真实微生物组数据模拟的工具。
Microbiome. 2024 Jul 22;12(1):135. doi: 10.1186/s40168-024-01822-z.
2
phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data.phylaGAN:通过条件 GAN 和自动编码器进行数据增强,以改善使用微生物组数据进行疾病预测的准确性。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae161.
3
Machine learning and deep learning applications in microbiome research.
机器学习与深度学习在微生物组研究中的应用。
ISME Commun. 2022 Oct 6;2(1):98. doi: 10.1038/s43705-022-00182-9.
4
Integrated Multi-Cohort Analysis of the Parkinson's Disease Gut Metagenome.帕金森病肠道宏基因组的综合多队列分析
Mov Disord. 2023 Mar;38(3):399-409. doi: 10.1002/mds.29300. Epub 2023 Jan 24.
5
Location-specific signatures of Crohn's disease at a multi-omics scale.在多组学层面上,克罗恩病的特定位置特征。
Microbiome. 2022 Aug 24;10(1):133. doi: 10.1186/s40168-022-01331-x.
6
Signature of Alzheimer's Disease in Intestinal Microbiome: Results From the AlzBiom Study.肠道微生物群中阿尔茨海默病的特征:来自阿尔茨海默病生物标志物研究(AlzBiom Study)的结果
Front Neurosci. 2022 Apr 19;16:792996. doi: 10.3389/fnins.2022.792996. eCollection 2022.
7
Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy.基于匹配参考基因组绕过分类学的宏基因组群落生态学的系统发育分析。
mSystems. 2022 Apr 26;7(2):e0016722. doi: 10.1128/msystems.00167-22. Epub 2022 Apr 4.
8
Multi-omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity.溃疡性结肠炎肠道微生物组的多组学分析将脆弱拟杆菌蛋白酶与疾病严重程度联系起来。
Nat Microbiol. 2022 Feb;7(2):262-276. doi: 10.1038/s41564-021-01050-3. Epub 2022 Jan 27.
9
Delving Deep Into Label Smoothing.深入探究标签平滑化。
IEEE Trans Image Process. 2021;30:5984-5996. doi: 10.1109/TIP.2021.3089942. Epub 2021 Jun 30.
10
Towards multi-label classification: Next step of machine learning for microbiome research.迈向多标签分类:微生物组研究机器学习的下一步。
Comput Struct Biotechnol J. 2021 Apr 28;19:2742-2749. doi: 10.1016/j.csbj.2021.04.054. eCollection 2021.