Suppr超能文献

微生物存在-缺失数据集中共现模式的统计分析。

Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.

作者信息

Mainali Kumar P, Bewick Sharon, Thielen Peter, Mehoke Thomas, Breitwieser Florian P, Paudel Shishir, Adhikari Arjun, Wolfe Joshua, Slud Eric V, Karig David, Fagan William F

机构信息

Department of Biology, University of Maryland, College Park, Maryland, United States of America.

Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, Maryland, United States of America.

出版信息

PLoS One. 2017 Nov 16;12(11):e0187132. doi: 10.1371/journal.pone.0187132. eCollection 2017.

Abstract

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for species prevalence. This model is available from recent statistics literature, and can be used for evaluating the significance of any value of an empirically observed Jaccard's index. The resulting simple, yet effective method for handling correlation analysis of microbial presence-absence datasets provides a robust means of testing and finding relationships and/or shared environmental responses among microbial taxa.

摘要

基于宏观生态学的悠久历史,微生物组数据集的相关性分析正成为识别细菌类群之间关系或共享生态位的常见做法。然而,许多困扰宏观尺度群落此类分析的统计问题在微生物群落中仍未得到解决。在这里,我们讨论基于存在 - 缺失数据的微生物物种相关性分析中的问题。我们关注存在 - 缺失数据,是因为此类信息更容易从测序研究中获得,特别是对于全基因组测序,目前丰度估计仍处于起步阶段。首先,我们展示了皮尔逊相关系数(r)和杰卡德指数(J)——存在 - 缺失数据相关性分析中最常用的两个指标——应用于典型微生物组数据集时如何相互矛盾。例如,在我们的数据集中,通过r预测为显著相关的物种对中有14%使用J预测并非显著相关,而通过J预测为显著相关的物种对中有37.4%使用r预测并非显著相关。这种不匹配在至少有一个稀有物种(患病率<10%)的物种对中尤为常见,这解释了为什么r和J在微生物组数据集中可能差异更大,因为微生物组数据集中存在大量稀有分类单元。事实上,我们研究中所有物种对中有74%至少有一个稀有物种。接下来,我们展示了皮尔逊相关系数如何导致正分类群关系的人为夸大,以及这如何成为微生物组研究中的一个特殊问题。然后,我们说明了杰卡德相似性指数(J)如何比皮尔逊相关系数有所改进。然而,杰卡德指数的标准零模型存在缺陷,因此会引入其自身的一系列虚假结论。因此,我们基于超几何分布确定了一个更好的零模型,该模型适当地校正了物种患病率。这个模型来自最近的统计文献,可用于评估任何经验观察到的杰卡德指数值的显著性。由此产生的用于处理微生物存在 - 缺失数据集相关性分析的简单而有效的方法,为测试和发现微生物类群之间的关系和/或共享环境响应提供了一种可靠的手段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9f4/5689832/1267739cb99a/pone.0187132.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验