Tsilimigras Matthew C B, Fodor Anthony A
Department of Bioinformatics and Genomics, UNC Charlotte, Bioinformatics Building, The University of North Carolina, Charlotte 9201, University City Blvd, Charlotte.
Department of Bioinformatics and Genomics, UNC Charlotte, Bioinformatics Building, The University of North Carolina, Charlotte 9201, University City Blvd, Charlotte.
Ann Epidemiol. 2016 May;26(5):330-5. doi: 10.1016/j.annepidem.2016.03.002. Epub 2016 Mar 31.
Human microbiome studies are within the realm of compositional data with the absolute abundances of microbes not recoverable from sequence data alone. In compositional data analysis, each sample consists of proportions of various organisms with a sum constrained to a constant. This simple feature can lead traditional statistical treatments when naively applied to produce errant results and spurious correlations.
We review the origins of compositionality in microbiome data, the theory and usage of compositional data analysis in this setting and some recent attempts at solutions to these problems.
Microbiome sequence data sets are typically high dimensional, with the number of taxa much greater than the number of samples, and sparse as most taxa are only observed in a small number of samples. These features of microbiome sequence data interact with compositionality to produce additional challenges in analysis.
Despite sophisticated approaches to statistical transformation, the analysis of compositional data may remain a partially intractable problem, limiting inference. We suggest that current research needs include better generation of simulated data and further study of how the severity of compositional effects changes when sampling microbial communities of widely differing diversity.
人类微生物组研究属于成分数据范畴,仅从序列数据无法获取微生物的绝对丰度。在成分数据分析中,每个样本由各种生物体的比例组成,其总和被限制为一个常数。当简单地应用传统统计方法时,这一简单特征可能会导致错误的结果和虚假的相关性。
我们回顾了微生物组数据中成分性的起源、在这种情况下成分数据分析的理论和应用,以及最近一些针对这些问题的解决方案尝试。
微生物组序列数据集通常是高维的,分类单元的数量远大于样本数量,并且很稀疏,因为大多数分类单元仅在少数样本中被观察到。微生物组序列数据的这些特征与成分性相互作用,在分析中产生了额外的挑战。
尽管有复杂的统计转换方法,但成分数据的分析可能仍然是一个部分难以解决的问题,限制了推断。我们建议当前的研究需求包括更好地生成模拟数据,以及进一步研究在对多样性差异很大的微生物群落进行采样时,成分效应的严重程度如何变化。