微生物存在-缺失数据集中共现模式的统计分析。

Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.

作者信息

Mainali Kumar P, Bewick Sharon, Thielen Peter, Mehoke Thomas, Breitwieser Florian P, Paudel Shishir, Adhikari Arjun, Wolfe Joshua, Slud Eric V, Karig David, Fagan William F

机构信息

Department of Biology, University of Maryland, College Park, Maryland, United States of America.

Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, Maryland, United States of America.

出版信息

PLoS One. 2017 Nov 16;12(11):e0187132. doi: 10.1371/journal.pone.0187132. eCollection 2017.

DOI:10.1371/journal.pone.0187132

PMID:29145425

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5689832/

Abstract

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for species prevalence. This model is available from recent statistics literature, and can be used for evaluating the significance of any value of an empirically observed Jaccard's index. The resulting simple, yet effective method for handling correlation analysis of microbial presence-absence datasets provides a robust means of testing and finding relationships and/or shared environmental responses among microbial taxa.

摘要

基于宏观生态学的悠久历史，微生物组数据集的相关性分析正成为识别细菌类群之间关系或共享生态位的常见做法。然而，许多困扰宏观尺度群落此类分析的统计问题在微生物群落中仍未得到解决。在这里，我们讨论基于存在 - 缺失数据的微生物物种相关性分析中的问题。我们关注存在 - 缺失数据，是因为此类信息更容易从测序研究中获得，特别是对于全基因组测序，目前丰度估计仍处于起步阶段。首先，我们展示了皮尔逊相关系数（r）和杰卡德指数（J）——存在 - 缺失数据相关性分析中最常用的两个指标——应用于典型微生物组数据集时如何相互矛盾。例如，在我们的数据集中，通过r预测为显著相关的物种对中有14%使用J预测并非显著相关，而通过J预测为显著相关的物种对中有37.4%使用r预测并非显著相关。这种不匹配在至少有一个稀有物种（患病率<10%）的物种对中尤为常见，这解释了为什么r和J在微生物组数据集中可能差异更大，因为微生物组数据集中存在大量稀有分类单元。事实上，我们研究中所有物种对中有74%至少有一个稀有物种。接下来，我们展示了皮尔逊相关系数如何导致正分类群关系的人为夸大，以及这如何成为微生物组研究中的一个特殊问题。然后，我们说明了杰卡德相似性指数（J）如何比皮尔逊相关系数有所改进。然而，杰卡德指数的标准零模型存在缺陷，因此会引入其自身的一系列虚假结论。因此，我们基于超几何分布确定了一个更好的零模型，该模型适当地校正了物种患病率。这个模型来自最近的统计文献，可用于评估任何经验观察到的杰卡德指数值的显著性。由此产生的用于处理微生物存在 - 缺失数据集相关性分析的简单而有效的方法，为测试和发现微生物类群之间的关系和/或共享环境响应提供了一种可靠的手段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/5689832/1267739cb99a/pone.0187132.g001.jpg

相似文献

Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.

PLoS One. 2017 Nov 16;12(11):e0187132. doi: 10.1371/journal.pone.0187132. eCollection 2017.

Identifying Keystone Species in the Microbial Community Based on Cross- Sectional Data.

Curr Gene Ther. 2018;18(5):296-306. doi: 10.2174/1566523218666181008155734.

Integrating Computational Methods to Investigate the Macroecology of Microbiomes.

Front Genet. 2020 Jan 17;10:1344. doi: 10.3389/fgene.2019.01344. eCollection 2019.

Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data.

Front Genet. 2019 Jun 6;10:516. doi: 10.3389/fgene.2019.00516. eCollection 2019.

A two-part mixed-effects model for analyzing longitudinal microbiome compositional data.

Bioinformatics. 2016 Sep 1;32(17):2611-7. doi: 10.1093/bioinformatics/btw308. Epub 2016 May 14.

A better index for analysis of co-occurrence and similarity.

Sci Adv. 2022 Jan 28;8(4):eabj9204. doi: 10.1126/sciadv.abj9204. Epub 2022 Jan 26.

Microbial ecosystems are dominated by specialist taxa.

Ecol Lett. 2015 Sep;18(9):974-82. doi: 10.1111/ele.12478. Epub 2015 Aug 6.

Universality of human microbial dynamics.

Nature. 2016 Jun 9;534(7606):259-62. doi: 10.1038/nature18301.

Difficulty in inferring microbial community structure based on co-occurrence network approaches.

BMC Bioinformatics. 2019 Jun 13;20(1):329. doi: 10.1186/s12859-019-2915-1.

Using null models to infer microbial co-occurrence networks.

PLoS One. 2017 May 11;12(5):e0176751. doi: 10.1371/journal.pone.0176751. eCollection 2017.

引用本文的文献

Combining genomics and semen microbiome increases the accuracy of predicting bull prolificacy.

J Anim Breed Genet. 2025 Mar;142(2):237-250. doi: 10.1111/jbg.12899. Epub 2024 Sep 4.

Host-associated helminth diversity and microbiome composition contribute to anti-pathogen defences in tropical frogs impacted by forest fragmentation.

R Soc Open Sci. 2024 Jun 12;11(6):240530. doi: 10.1098/rsos.240530. eCollection 2024 Jun.

A One Health approach based on genomics for enhancing the surveillance in Colombia.

IJID Reg. 2023 Oct 8;9:80-87. doi: 10.1016/j.ijregi.2023.09.008. eCollection 2023 Dec.

Impairment of Intestinal Barrier Function Induced by Early Weaning Autophagy and Apoptosis Associated With Gut Microbiome and Metabolites.

Front Immunol. 2021 Dec 15;12:804870. doi: 10.3389/fimmu.2021.804870. eCollection 2021.

Next Generation Microbiome Research: Identification of Keystone Species in the Metabolic Regulation of Host-Gut Microbiota Interplay.

Front Cell Dev Biol. 2021 Sep 1;9:719072. doi: 10.3389/fcell.2021.719072. eCollection 2021.

Revealing the role of Plant Growth Promoting Rhizobacteria in suppressive soils against f.sp. based on metagenomic analysis.

Heliyon. 2021 Jul 21;7(8):e07636. doi: 10.1016/j.heliyon.2021.e07636. eCollection 2021 Aug.

Relationship of DUX4 and target gene expression in FSHD myocytes.

Hum Mutat. 2021 Apr;42(4):421-433. doi: 10.1002/humu.24171. Epub 2021 Feb 4.

From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes.

Comput Struct Biotechnol J. 2020 Jun 25;18:1722-1734. doi: 10.1016/j.csbj.2020.06.028. eCollection 2020.

Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities.

Front Genet. 2019 Nov 8;10:995. doi: 10.3389/fgene.2019.00995. eCollection 2019.

Resolution and Cooccurrence Patterns of Gardnerella leopoldii, G. swidsinskii, G. piotii, and G. vaginalis within the Vaginal Microbiome.

Infect Immun. 2019 Nov 18;87(12). doi: 10.1128/IAI.00532-19. Print 2019 Dec.

本文引用的文献

The comparative analysis of species occurrence patterns on archipelagos.

Oecologia. 1987 Sep;73(2):282-287. doi: 10.1007/BF00377519.

A statistical table for the degree of coexistence between two species.

Oecologia. 1979 Jan;44(3):287-289. doi: 10.1007/BF00545229.

Measures of ecological association.

Oecologia. 1981 Jul;49(3):371-376. doi: 10.1007/BF00347601.

A method for automated pathogenic content estimation with application to rheumatoid arthritis.

BMC Syst Biol. 2016 Nov 15;10(1):107. doi: 10.1186/s12918-016-0344-6.

Plastic ingestion by Newell's (Puffinus newelli) and wedge-tailed shearwaters (Ardenna pacifica) in Hawaii.

Environ Sci Pollut Res Int. 2016 Dec;23(23):23951-23958. doi: 10.1007/s11356-016-7613-1. Epub 2016 Sep 15.

Unexpectedly High Beta-Diversity of Root-Associated Fungal Communities in the Bolivian Andes.

Front Microbiol. 2016 Aug 31;7:1377. doi: 10.3389/fmicb.2016.01377. eCollection 2016.

Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome.

PLoS Pathog. 2016 Sep 8;12(9):e1005869. doi: 10.1371/journal.ppat.1005869. eCollection 2016 Sep.

Challenges for case-control studies with microbiome data.

Ann Epidemiol. 2016 May;26(5):336-341.e1. doi: 10.1016/j.annepidem.2016.03.009. Epub 2016 Apr 7.

bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.

J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.

Stochastic neutral modelling of the Gut Microbiota's relative species abundance from next generation sequencing data.

BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):16. doi: 10.1186/s12859-015-0858-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

微生物存在-缺失数据集中共现模式的统计分析。

Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献