对数比值分析微生物组数据时，其中很多零值与文库大小有关。

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

机构信息

Biometris, Wageningen University & Research, Wageningen, The Netherlands.

Biointeractions and Plant Health, Wageningen University & Research, Wageningen, The Netherlands.

出版信息

Mol Ecol Resour. 2021 Aug;21(6):1866-1874. doi: 10.1111/1755-0998.13391. Epub 2021 May 3.

DOI:10.1111/1755-0998.13391

PMID:33763959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8360050/

Abstract

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artefact of the sequencing platform, and as a result, such data are compositional. To avoid library size dependency, one common way of analysing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centred log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this study, we show on real data and by simulation that, applied to data that combine these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.

摘要

通过扩增子测序收集的微生物组组成数据是关于分类单元的计数数据，其中每个样本的总计数（库大小）是测序平台的人为产物，因此，此类数据具有组成性。为了避免库大小依赖性，一种常见的分析多元组成数据的方法是对经过中心对数比转换的数据进行主成分分析（PCA），以下简称对数比 PCA。扩增子测序数据的两个典型特征是库大小差异大和大量零值。在这项研究中，我们通过真实数据和模拟表明，对数比 PCA 应用于结合了这两个方面的数据时，仍然严重依赖于库大小。这导致在对数比冗余分析中针对任何解释变量进行检验时的功效降低。如果库大小与解释变量之间存在相关性，那么第一类错误就会膨胀。我们探索了解决这个问题的可能方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0919/8360050/1b1209a42621/MEN-21-1866-g005.jpg

相似文献

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

Mol Ecol Resour. 2021 Aug;21(6):1866-1874. doi: 10.1111/1755-0998.13391. Epub 2021 May 3.

Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data.

Biometrics. 2021 Dec;77(4):1369-1384. doi: 10.1111/biom.13384. Epub 2020 Oct 19.

Simple and flexible sign and rank-based methods for testing for differential abundance in microbiome studies.

PLoS One. 2023 Sep 26;18(9):e0292055. doi: 10.1371/journal.pone.0292055. eCollection 2023.

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

BMC Microbiol. 2017 Sep 13;17(1):194. doi: 10.1186/s12866-017-1101-8.

mbDenoise: microbiome data denoising using zero-inflated probabilistic principal components analysis.

Genome Biol. 2022 Apr 14;23(1):94. doi: 10.1186/s13059-022-02657-3.

A distance based multisample test for high-dimensional compositional data with applications to the human microbiome.

BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):205. doi: 10.1186/s12859-020-3530-x.

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data.

PeerJ. 2018 Apr 2;6:e4600. doi: 10.7717/peerj.4600. eCollection 2018.

Compositional knockoff filter for high-dimensional regression analysis of microbiome data.

Biometrics. 2021 Sep;77(3):984-995. doi: 10.1111/biom.13336. Epub 2020 Jul 25.

Zero-Inflated gaussian mixed models for analyzing longitudinal microbiome data.

PLoS One. 2020 Nov 9;15(11):e0242073. doi: 10.1371/journal.pone.0242073. eCollection 2020.

Generalized Hotelling's test for paired compositional data with application to human microbiome studies.

Genet Epidemiol. 2018 Jul;42(5):459-469. doi: 10.1002/gepi.22127. Epub 2018 May 7.

引用本文的文献

parafac4microbiome: exploratory analysis of longitudinal microbiome data using parallel factor analysis.

mSystems. 2025 Jun 17;10(6):e0047225. doi: 10.1128/msystems.00472-25. Epub 2025 May 21.

Score matching for differential abundance testing of compositional high-throughput sequencing data.

bioRxiv. 2024 Dec 9:2024.12.05.627006. doi: 10.1101/2024.12.05.627006.

Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses.

mSphere. 2024 Feb 28;9(2):e0035423. doi: 10.1128/msphere.00354-23. Epub 2024 Jan 22.

Waste not, want not: revisiting the analysis that called into question the practice of rarefaction.

mSphere. 2024 Jan 30;9(1):e0035523. doi: 10.1128/msphere.00355-23. Epub 2023 Dec 6.

Novel Application of Survival Models for Predicting Microbial Community Transitions with Variable Selection for Environmental DNA.

Appl Environ Microbiol. 2022 Mar 22;88(6):e0214621. doi: 10.1128/AEM.02146-21. Epub 2022 Feb 9.

本文引用的文献

Impact of Gut Bacteria on the Infection and Transmission of Pathogenic Arboviruses by Biting Midges and Mosquitoes.

Microb Ecol. 2020 Oct;80(3):703-717. doi: 10.1007/s00248-020-01517-6. Epub 2020 May 27.

Plant host and drought shape the root associated fungal microbiota in rice.

PeerJ. 2019 Sep 11;7:e7463. doi: 10.7717/peerj.7463. eCollection 2019.

Microbiome Datasets Are Compositional: And This Is Not Optional.

Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. eCollection 2017.

Compositional data analysis of the microbiome: fundamentals, tools, and challenges.

Ann Epidemiol. 2016 May;26(5):330-5. doi: 10.1016/j.annepidem.2016.03.002. Epub 2016 Mar 31.

Waste not, want not: why rarefying microbiome data is inadmissible.

PLoS Comput Biol. 2014 Apr 3;10(4):e1003531. doi: 10.1371/journal.pcbi.1003531. eCollection 2014 Apr.

The contributions of rare objects in correspondence analysis.

Ecology. 2013 Jan;94(1):241-9. doi: 10.1890/11-1730.1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对数比值分析微生物组数据时，其中很多零值与文库大小有关。

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

机构信息

Biometris, Wageningen University & Research, Wageningen, The Netherlands.

Biointeractions and Plant Health, Wageningen University & Research, Wageningen, The Netherlands.

出版信息

Mol Ecol Resour. 2021 Aug;21(6):1866-1874. doi: 10.1111/1755-0998.13391. Epub 2021 May 3.

DOI:10.1111/1755-0998.13391

PMID:33763959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8360050/

Abstract

摘要

对数比值分析微生物组数据时，其中很多零值与文库大小有关。

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

对数比值分析微生物组数据时，其中很多零值与文库大小有关。

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献