一种用于识别宏基因组样本中差异丰富特征的稳健方法。

A robust approach for identifying differentially abundant features in metagenomic samples.

作者信息

Sohn Michael B, Du Ruofei, An Lingling

机构信息

Interdisciplinary Program in Statistics and.

Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA.

出版信息

Bioinformatics. 2015 Jul 15;31(14):2269-75. doi: 10.1093/bioinformatics/btv165. Epub 2015 Mar 19.

DOI:10.1093/bioinformatics/btv165

PMID:25792553

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4495302/

Abstract

MOTIVATION

The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales are not properly normalized within and between communities, prior to the analysis of differential abundance. Normalization methods used in the differential analysis typically try to adjust counts on different scales to a common scale using the total sum, mean or median of representative features across all samples. These methods often yield undesirable results when the difference in total counts of differentially abundant features (DAFs) across different conditions is large.

RESULTS

We develop a novel method, Ratio Approach for Identifying Differential Abundance (RAIDA), which utilizes the ratio between features in a modified zero-inflated lognormal model. RAIDA removes possible problems associated with counts on different scales within and between conditions. As a result, its performance is not affected by the amount of difference in total abundances of DAFs across different conditions. Through comprehensive simulation studies, the performance of our method is consistently powerful, and under some situations, RAIDA greatly surpasses other existing methods. We also apply RAIDA on real datasets of type II diabetes and find interesting results consistent with previous reports.

AVAILABILITY AND IMPLEMENTATION

An R package for RAIDA can be accessed from http://cals.arizona.edu/%7Eanling/sbg/software.htm.

摘要

动机

对特征（例如物种或基因）的差异丰度进行分析，能够让我们更好地理解微生物群落，从而加深我们对微生物群落行为的认识和理解。然而，如果在分析差异丰度之前，不同尺度上特征的丰度或计数在群落内部和群落之间没有得到适当的标准化，那么这也可能会误导我们对微生物群落特征的认识。差异分析中使用的标准化方法通常试图使用所有样本中代表性特征的总和、均值或中位数，将不同尺度上的计数调整到一个共同的尺度。当不同条件下差异丰富特征（DAF）的总数差异很大时，这些方法往往会产生不理想的结果。

结果

我们开发了一种新方法，即识别差异丰度的比率方法（RAIDA），它在修正的零膨胀对数正态模型中利用特征之间的比率。RAIDA消除了与不同条件下和不同条件之间不同尺度计数相关的潜在问题。因此，其性能不受不同条件下DAF总丰度差异量的影响。通过全面的模拟研究，我们方法的性能始终很强，并且在某些情况下，RAIDA大大超过了其他现有方法。我们还将RAIDA应用于II型糖尿病的真实数据集，并发现了与先前报告一致的有趣结果。

可用性和实现

可以从http://cals.arizona.edu/%7Eanling/sbg/software.htm访问RAIDA的R包。

相似文献

A robust approach for identifying differentially abundant features in metagenomic samples.一种用于识别宏基因组样本中差异丰富特征的稳健方法。

Bioinformatics. 2015 Jul 15;31(14):2269-75. doi: 10.1093/bioinformatics/btv165. Epub 2015 Mar 19.

An informative approach on differential abundance analysis for time-course metagenomic sequencing data.一种针对时间序列宏基因组测序数据的差异丰度分析的信息性方法。

Bioinformatics. 2017 May 1;33(9):1286-1292. doi: 10.1093/bioinformatics/btw828.

A novel normalization and differential abundance test framework for microbiome data.一种用于微生物组数据的归一化和差异丰度测试的新框架。

Bioinformatics. 2020 Jul 1;36(13):3959-3965. doi: 10.1093/bioinformatics/btaa255.

A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes.一种用于宏基因组功能分析中特征选择与比较的两阶段统计程序。

Bioinformatics. 2015 Jan 15;31(2):158-65. doi: 10.1093/bioinformatics/btu635. Epub 2014 Sep 24.

MetaLonDA: a flexible R package for identifying time intervals of differentially abundant features in metagenomic longitudinal studies.MetaLonDA：一个用于识别宏基因组纵向研究中差异丰度特征时间区间的灵活 R 包。

Microbiome. 2018 Feb 13;6(1):32. doi: 10.1186/s40168-018-0402-y.

An omnibus test for differential distribution analysis of microbiome sequencing data.一种用于微生物组测序数据差异分布分析的集成测试方法。

Bioinformatics. 2018 Feb 15;34(4):643-651. doi: 10.1093/bioinformatics/btx650.

MetaGen: reference-free learning with multiple metagenomic samples.MetaGen：使用多个宏基因组样本进行无参考学习。

Genome Biol. 2017 Oct 3;18(1):187. doi: 10.1186/s13059-017-1323-y.

Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes.用于比较宏基因组学的消减组装及其在2型糖尿病宏基因组中的应用。

Genome Biol. 2015 Nov 2;16:243. doi: 10.1186/s13059-015-0804-0.

Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics.比较宏基因组学中差异丰度基因鉴定方法的统计评估

BMC Genomics. 2016 Jan 25;17:78. doi: 10.1186/s12864-016-2386-y.

Phylogeny-based classification of microbial communities.基于系统发育的微生物群落分类。

Bioinformatics. 2014 Feb 15;30(4):449-56. doi: 10.1093/bioinformatics/btt700. Epub 2013 Dec 24.

引用本文的文献

Group-wise normalization in differential abundance analysis of microbiome samples.微生物组样本差异丰度分析中的分组归一化

BMC Bioinformatics. 2025 Jul 29;26(1):196. doi: 10.1186/s12859-025-06235-9.

Group-wise normalization in differential abundance analysis of microbiome samples.微生物组样本差异丰度分析中的分组归一化

ArXiv. 2024 Nov 23:arXiv:2411.15400v1.

Wise Roles and Future Visionary Endeavors of Current Emperor: Advancing Dynamic Methods for Longitudinal Microbiome Meta-Omics Data in Personalized and Precision Medicine.当代帝王的明智角色与未来前瞻性努力：推进个性化与精准医学中纵向微生物组元组学数据的动态方法

Adv Sci (Weinh). 2024 Dec;11(47):e2400458. doi: 10.1002/advs.202400458. Epub 2024 Nov 13.

ADAPT: Analysis of Microbiome Differential Abundance by Pooling Tobit Models.ADAPT：通过池化 Tobit 模型分析微生物组差异丰度。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae661.

TimeNorm: a novel normalization method for time course microbiome data.TimeNorm：一种用于时间序列微生物组数据的新型归一化方法。

Front Genet. 2024 Sep 24;15:1417533. doi: 10.3389/fgene.2024.1417533. eCollection 2024.

An optimal normalization method for high sparse compositional microbiome data.一种优化的高稀疏微生物组数据的标准化方法。

PLoS Comput Biol. 2024 Aug 5;20(8):e1012338. doi: 10.1371/journal.pcbi.1012338. eCollection 2024 Aug.

A GLM-based zero-inflated generalized Poisson factor model for analyzing microbiome data.一种基于广义线性模型的零膨胀广义泊松因子模型，用于分析微生物组数据。

Front Microbiol. 2024 May 30;15:1394204. doi: 10.3389/fmicb.2024.1394204. eCollection 2024.

ADAPT: Analysis of Microbiome Differential Abundance by Pooling Tobit Models.ADAPT：通过合并托比特模型分析微生物组差异丰度

bioRxiv. 2024 May 17:2024.05.14.594186. doi: 10.1101/2024.05.14.594186.

mbDecoda: a debiased approach to compositional data analysis for microbiome surveys.mbDecoda：一种用于微生物组调查的成分数据分析的偏差校正方法。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae205.

Statistical normalization methods in microbiome data with application to microbiome cancer research.微生物组数据的统计归一化方法及其在微生物组癌症研究中的应用。

Gut Microbes. 2023 Dec;15(2):2244139. doi: 10.1080/19490976.2023.2244139.

本文引用的文献

Hierarchical Clustering With Prototypes via Minimax Linkage.基于极大极小链接的带原型的层次聚类

J Am Stat Assoc. 2011;106(495):1075-1084. doi: 10.1198/jasa.2011.tm10183.

Accurate genome relative abundance estimation for closely related species in a metagenomic sample.在宏基因组样本中对密切相关的物种进行精确的基因组相对丰度估计。

BMC Bioinformatics. 2014 Jul 16;15(1):242. doi: 10.1186/1471-2105-15-242.

Differential abundance analysis for microbial marker-gene surveys.微生物标记基因调查的差异丰度分析。

Nat Methods. 2013 Dec;10(12):1200-2. doi: 10.1038/nmeth.2658. Epub 2013 Sep 29.

A metagenome-wide association study of gut microbiota in type 2 diabetes.2 型糖尿病患者肠道微生物组的宏基因组关联研究。

Nature. 2012 Oct 4;490(7418):55-60. doi: 10.1038/nature11450. Epub 2012 Sep 26.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.Illumina 高通量 RNA 测序数据分析中标准化方法的综合评估。

Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.

Metagenomics - a guide from sampling to data analysis.宏基因组学——从样本采集到数据分析的指南

Microb Inform Exp. 2012 Feb 9;2(1):3. doi: 10.1186/2042-5783-2-3.

Metagenomics and personalized medicine.宏基因组学与个性化医学。

Cell. 2011 Sep 30;147(1):44-56. doi: 10.1016/j.cell.2011.09.009.

Differential expression analysis for sequence count data.差异表达分析序列计数数据。

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。

Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.

Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults.2 型糖尿病患者的肠道微生物群与非糖尿病患者不同。

PLoS One. 2010 Feb 5;5(2):e9085. doi: 10.1371/journal.pone.0009085.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验