归一化和微生物差异丰度策略取决于数据特征。

Normalization and microbial differential abundance strategies depend upon data characteristics.

机构信息

Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, CO, 80309, USA.

Departments of Pediatrics, University of California San Diego, 9500 Gilman Drive, MC 0763, La Jolla, CA, 92093, USA.

出版信息

Microbiome. 2017 Mar 3;5(1):27. doi: 10.1186/s40168-017-0237-y.

DOI:10.1186/s40168-017-0237-y

PMID:28253908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5335496/

Abstract

BACKGROUND

Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses.

RESULTS

Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate.

CONCLUSIONS

These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.

摘要

背景

16S 核糖体 RNA（rRNA）扩增子测序的数据对生态和统计解释提出了挑战。特别是，文库大小通常跨越几个数量级变化，并且数据包含许多零。尽管我们通常有兴趣比较两个或更多组的生态系统中分类群的相对丰度，但我们只能测量从生态系统中获得的标本中的分类群相对丰度。由于标本中分类群相对丰度的比较与生态系统中分类群相对丰度的比较不等效，因此这是一个特殊的挑战。其次，由于标本（以及生态系统）中分类群的相对丰度总和为 1，因此这些是组成数据。由于组成数据受单纯形（总和为 1）约束，并且在欧几里得空间中不受约束，因此许多标准分析方法不适用。在这里，我们评估这些挑战如何影响现有归一化方法和差异丰度分析的性能。

结果

对归一化的影响：当组在整体微生物组成上有很大差异时，大多数归一化方法都能够成功地根据生物起源对样品进行聚类。与其他归一化技术相比，稀少化更清楚地根据生物起源对样品进行聚类，而其他归一化技术则根据存在或不存在的顺序度量标准对样品进行聚类。替代归一化度量标准可能由于文库大小而容易受到伪影的影响。对差异丰度测试的影响：我们在前一项工作的基础上，使用稀少化和原始数据评估了七种拟议的统计方法。我们的模拟研究表明，许多差异丰度测试方法的错误发现率不会因稀少化本身而增加，尽管当然稀少化会由于消除一部分可用数据而导致灵敏度降低。对于平均文库大小差异较大（约 10×）的组，稀少化会降低错误发现率。未添加常数的 DESeq2 在较小的数据集（每组<20 个样本）上提高了灵敏度，但随着样本数量的增加，灵敏度趋于更高的错误发现率，并且具有非常不均匀（约 10×）的文库大小和/或组成效应。为了对生态系统中分类群丰度的推断，微生物组组成分析（ANCOM）不仅非常敏感（每组>20 个样本），而且是唯一经过测试的方法，具有良好的错误发现率控制。

结论

这些发现指导根据给定研究的数据特征选择合适的归一化和差异丰度技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/430c/5335496/2b74bc90c5b1/40168_2017_237_Fig1_HTML.jpg

相似文献

Normalization and microbial differential abundance strategies depend upon data characteristics.

Microbiome. 2017 Mar 3;5(1):27. doi: 10.1186/s40168-017-0237-y.

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

BMC Microbiol. 2017 Sep 13;17(1):194. doi: 10.1186/s12866-017-1101-8.

Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.

Microbiome. 2016 Nov 25;4(1):62. doi: 10.1186/s40168-016-0208-8.

An empirical Bayes approach to normalization and differential abundance testing for microbiome data.

BMC Bioinformatics. 2020 Jun 3;21(1):225. doi: 10.1186/s12859-020-03552-z.

Normalization of environmental metagenomic DNA enhances the discovery of under-represented microbial community members.

Lett Appl Microbiol. 2015 Apr;60(4):359-66. doi: 10.1111/lam.12380. Epub 2015 Jan 15.

The bias associated with amplicon sequencing does not affect the quantitative assessment of bacterial community dynamics.

PLoS One. 2014 Jun 12;9(6):e99722. doi: 10.1371/journal.pone.0099722. eCollection 2014.

Towards Quantitative Microbiome Community Profiling Using Internal Standards.

Appl Environ Microbiol. 2019 Feb 20;85(5). doi: 10.1128/AEM.02634-18. Print 2019 Mar 1.

Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities.

PeerJ. 2020 Aug 3;8:e9593. doi: 10.7717/peerj.9593. eCollection 2020.

Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities.

Sci Rep. 2021 Nov 16;11(1):22302. doi: 10.1038/s41598-021-01636-1.

Tissue-Associated Bacterial Alterations in Rectal Carcinoma Patients Revealed by 16S rRNA Community Profiling.

Front Cell Infect Microbiol. 2016 Dec 9;6:179. doi: 10.3389/fcimb.2016.00179. eCollection 2016.

引用本文的文献

Exploring bacterial and eukaryotic communities in the gut microbiota of urban and rural cats (Felis catus) in Colombia.

Vet Res Commun. 2025 Sep 15;49(6):312. doi: 10.1007/s11259-025-10831-8.

Detecting and mitigating doppelgänger bias in microbiome data: impacts on machine learning and disease classification.

Gut Microbes. 2025 Dec;17(1):2554196. doi: 10.1080/19490976.2025.2554196. Epub 2025 Sep 1.

mbSparse: an autoencoder-based imputation method to address sparsity in microbiome data.

Gut Microbes. 2025 Dec;17(1):2552347. doi: 10.1080/19490976.2025.2552347. Epub 2025 Sep 1.

The native soil microbiome is critical for early root-associated microbiota assembly and canola growth.

Environ Microbiome. 2025 Aug 26;20(1):112. doi: 10.1186/s40793-025-00774-7.

Assessment of absolute abundance in mother-infant gut microbiome using marine-sourced bacterial DNA spike-in and comparison with conventional quantification methods.

Microbiome Res Rep. 2025 Jun 9;4(2):23. doi: 10.20517/mrr.2024.94. eCollection 2025.

The microbiome is associated with obesity-related metabolome signature in the process of aging.

NPJ Biofilms Microbiomes. 2025 Aug 25;11(1):173. doi: 10.1038/s41522-025-00811-w.

Introducing the UK Crop Microbiome Cryobank data resource, AgMicrobiomeBase, with case studies and methods on metabarcoding analyses.

Environ Microbiome. 2025 Aug 21;20(1):108. doi: 10.1186/s40793-025-00768-5.

Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.

NAR Genom Bioinform. 2025 Aug 19;7(3):lqaf108. doi: 10.1093/nargab/lqaf108. eCollection 2025 Sep.

Melody: meta-analysis of microbiome association studies for discovering generalizable microbial signatures.

Genome Biol. 2025 Aug 18;26(1):245. doi: 10.1186/s13059-025-03721-4.

Microbiome data: tell me which metrics and I will tell you which communities.

ISME Commun. 2025 Jul 24;5(1):ycaf125. doi: 10.1093/ismeco/ycaf125. eCollection 2025 Jan.

本文引用的文献

Analysis of composition of microbiomes: a novel method for studying microbial composition.

Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. eCollection 2015.

Proportionality: a valid alternative to correlation for relative data.

PLoS Comput Biol. 2015 Mar 16;11(3):e1004075. doi: 10.1371/journal.pcbi.1004075. eCollection 2015 Mar.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses.

BMC Biol. 2014 Nov 12;12:87. doi: 10.1186/s12915-014-0087-z.

Evaluating bias of illumina-based bacterial 16S rRNA gene profiles.

Appl Environ Microbiol. 2014 Sep;80(18):5717-22. doi: 10.1128/AEM.01451-14. Epub 2014 Jul 7.

Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition.

Genome Biol. 2014 Jun 27;15(6):R76. doi: 10.1186/gb-2014-15-6-r76.

Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.

Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014.

Estimating coverage in metagenomic data sets and why it matters.

ISME J. 2014 Nov;8(11):2349-51. doi: 10.1038/ismej.2014.76. Epub 2014 May 13.

Waste not, want not: why rarefying microbiome data is inadmissible.

PLoS Comput Biol. 2014 Apr 3;10(4):e1003531. doi: 10.1371/journal.pcbi.1003531. eCollection 2014 Apr.

A fair comparison.

Nat Methods. 2014 Apr;11(4):359. doi: 10.1038/nmeth.2897.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

归一化和微生物差异丰度策略取决于数据特征。

Normalization and microbial differential abundance strategies depend upon data characteristics.

机构信息

Department of Chemical and Biological Engineering, University of Colorado at Boulder, Boulder, CO, 80309, USA.

Departments of Pediatrics, University of California San Diego, 9500 Gilman Drive, MC 0763, La Jolla, CA, 92093, USA.